0%| | 0/2230 [00:00> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 16:59:17,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 16:59:18,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 16:59:19,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 16:59:20,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 16:59:21,454 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 16:59:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 16:59:23,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 16:59:24,398 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 16:59:25,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 16:59:26,164 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 16:59:26,808 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 16:59:27,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 16:59:28,565 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 16:59:29,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 8.7584, 'learning_rate': 0.0, 'epoch': 0.0} [WARNING|modeling_utils.py:388] 2022-03-23 16:59:30,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%| | 1/2230 [00:14<9:11:19, 14.84s/it][WARNING|modeling_bart.py:1051] 2022-03-23 16:59:31,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 16:59:32,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 16:59:33,276 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 16:59:33,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 16:59:35,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 16:59:35,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 16:59:36,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 16:59:37,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 16:59:38,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 16:59:39,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 16:59:40,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 16:59:40,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 16:59:41,983 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 16:59:42,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 8.5833, 'learning_rate': 0.0, 'epoch': 0.0} [WARNING|modeling_bart.py:1051] 2022-03-23 16:59:43,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 16:59:44,364 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%| | 2/2230 [00:28<8:53:01, 14.35s/it][WARNING|modeling_bart.py:1051] 2022-03-23 16:59:45,514 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 16:59:46,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 16:59:47,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 16:59:47,850 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 16:59:48,964 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 16:59:49,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 16:59:50,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 16:59:51,366 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 16:59:52,481 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 16:59:53,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 16:59:54,211 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 16:59:54,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 16:59:55,912 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 16:59:56,541 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 16:59:57,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 8.5559, 'learning_rate': 6e-07, 'epoch': 0.01} [WARNING|modeling_utils.py:388] 2022-03-23 16:59:58,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%| | 3/2230 [00:42<8:47:15, 14.21s/it][WARNING|modeling_bart.py:1051] 2022-03-23 16:59:59,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:00,180 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:01,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:01,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:02,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:03,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:04,650 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:05,268 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:06,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:06,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:08,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:08,628 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:09,714 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:10,331 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:11,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:12,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▏ | 4/2230 [00:56<8:39:22, 14.00s/it] 0%|▏ | 4/2230 [00:56<8:39:22, 14.00s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:00:13,199 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:13,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:14,881 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:15,499 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:16,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:17,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:18,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:18,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:19,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:20,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:21,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:22,222 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:23,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:23,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 8.6015, 'learning_rate': 1.8e-06, 'epoch': 0.01} [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:24,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:25,560 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▏ | 5/2230 [01:10<8:33:10, 13.84s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:00:26,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:27,357 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:28,421 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:29,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:30,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:30,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:31,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:32,392 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:34,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:35,121 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:35,717 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:36,798 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:37,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:38,471 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 8.402, 'learning_rate': 2.4e-06, 'epoch': 0.01} [WARNING|modeling_utils.py:388] 2022-03-23 17:00:39,064 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▏ | 6/2230 [01:23<8:28:42, 13.72s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:00:40,214 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:40,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:41,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:42,499 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:43,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:44,180 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:45,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:45,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:46,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:47,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:48,504 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:49,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:50,153 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:50,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:51,792 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:52,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▎ | 7/2230 [01:36<8:23:30, 13.59s/it] 0%|▎ | 7/2230 [01:36<8:23:30, 13.59s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:00:53,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:54,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:55,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:55,795 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:56,849 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:57,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:00:58,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:00:59,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:00,153 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:00,740 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:01,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:02,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:03,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:04,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:05,146 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 7.9299, 'learning_rate': 3.6e-06, 'epoch': 0.02} [WARNING|modeling_utils.py:388] 2022-03-23 17:01:05,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▎ | 8/2230 [01:50<8:20:32, 13.52s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:01:06,909 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:07,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:08,563 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:09,166 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:10,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:10,819 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:11,857 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:12,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:13,467 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:14,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:15,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:16,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:17,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:18,382 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:18,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▎ | 9/2230 [02:03<8:17:02, 13.43s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:01:20,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 7.6735, 'learning_rate': 4.2e-06, 'epoch': 0.02} [WARNING|modeling_utils.py:388] 2022-03-23 17:01:20,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:21,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:22,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:23,469 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:24,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:25,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:25,723 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:26,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:27,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:28,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:28,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:30,046 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:30,648 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:31,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:32,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 7.5134, 'learning_rate': 4.8e-06, 'epoch': 0.02} 0%|▎ | 10/2230 [02:16<8:15:35, 13.39s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:01:33,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:34,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:35,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:35,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:36,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:37,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:38,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:39,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:40,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:41,561 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:42,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:43,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:43,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:44,752 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:45,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▍ | 11/2230 [02:29<8:11:17, 13.28s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:01:46,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 7.3091, 'learning_rate': 5.399999999999999e-06, 'epoch': 0.02} [WARNING|modeling_utils.py:388] 2022-03-23 17:01:47,034 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:48,067 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:48,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:49,672 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:50,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:51,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:51,822 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:52,845 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:53,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:54,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:55,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:56,046 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:56,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:01:57,623 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:01:58,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▍ | 12/2230 [02:42<8:06:20, 13.16s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:01:59,298 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 7.1164, 'learning_rate': 5.999999999999999e-06, 'epoch': 0.03} [WARNING|modeling_utils.py:388] 2022-03-23 17:01:59,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:00,894 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:01,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:02,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:03,041 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:04,077 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:05,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:06,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:06,667 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:07,676 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:08,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:09,243 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:09,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:10,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.7007, 'learning_rate': 6.599999999999999e-06, 'epoch': 0.03} [WARNING|modeling_utils.py:388] 2022-03-23 17:02:11,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▍ | 13/2230 [02:55<8:06:25, 13.16s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:02:12,561 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:13,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:14,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:14,729 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:15,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:16,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:17,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:17,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:18,900 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:19,480 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:20,488 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:21,047 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:22,047 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:22,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:23,608 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:24,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▍ | 14/2230 [03:08<8:02:09, 13.05s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:02:25,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.6293, 'learning_rate': 7.2e-06, 'epoch': 0.03} [WARNING|modeling_utils.py:388] 2022-03-23 17:02:25,857 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:26,876 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:27,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:28,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:28,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:30,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:30,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:31,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:32,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:33,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:33,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:34,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:35,289 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 6.3926, 'learning_rate': 7.799999999999998e-06, 'epoch': 0.03} [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:36,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:36,843 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▌ | 15/2230 [03:21<7:57:41, 12.94s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:02:37,940 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:38,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:39,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:40,067 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:41,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:41,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:42,611 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:43,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:44,146 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:44,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:45,709 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:46,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:47,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:47,808 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 6.3405, 'learning_rate': 8.4e-06, 'epoch': 0.04} [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:48,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:49,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▌ | 16/2230 [03:33<7:52:52, 12.82s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:02:50,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:51,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:52,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:52,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:53,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:54,104 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:55,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:55,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:56,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:57,214 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:58,184 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:02:58,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:02:59,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:00,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 6.1208, 'learning_rate': 8.999999999999999e-06, 'epoch': 0.04} [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:01,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:01,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▌ | 17/2230 [03:46<7:48:45, 12.71s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:03:02,927 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:03,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:04,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:04,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:05,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:06,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:07,501 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:08,067 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:09,034 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:09,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:10,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:11,068 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:12,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:12,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 6.0531, 'learning_rate': 9.6e-06, 'epoch': 0.04} [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:13,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:14,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▋ | 18/2230 [03:58<7:44:05, 12.59s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:03:15,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:15,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:16,735 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:17,277 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:18,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:18,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:19,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:20,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:21,287 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:21,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:22,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:23,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:24,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:24,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 5.7463, 'learning_rate': 1.02e-05, 'epoch': 0.04} [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:25,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:26,398 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▋ | 19/2230 [04:10<7:40:13, 12.49s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:03:27,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:28,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:28,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:29,554 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:30,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:31,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:32,047 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:32,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:33,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:34,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:35,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:35,600 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:36,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:37,104 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:38,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:38,602 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▋ | 20/2230 [04:23<7:37:52, 12.43s/it] 1%|▋ | 20/2230 [04:23<7:37:52, 12.43s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:03:39,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:40,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:41,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:41,850 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:42,814 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:43,354 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:44,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:44,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:45,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:46,356 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:47,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:47,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:48,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:49,345 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:50,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:50,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▋ | 21/2230 [04:35<7:34:20, 12.34s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:03:51,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.6696, 'learning_rate': 1.14e-05, 'epoch': 0.05} [WARNING|modeling_utils.py:388] 2022-03-23 17:03:52,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:53,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:53,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:54,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:55,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:56,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:56,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:57,856 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:58,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:03:59,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:03:59,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:00,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:04:01,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:02,314 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:04:02,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▊ | 22/2230 [04:47<7:30:38, 12.25s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:04:03,894 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.6137, 'learning_rate': 1.1999999999999999e-05, 'epoch': 0.05} [WARNING|modeling_utils.py:388] 2022-03-23 17:04:04,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:05,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:04:05,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:06,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:04:07,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:08,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:04:08,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:09,851 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:04:10,376 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:11,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:04:11,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:12,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:04:13,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:14,310 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:04:14,839 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▊ | 23/2230 [04:59<7:27:38, 12.17s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:04:15,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:04:16,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 5.4833, 'learning_rate': 1.26e-05, 'epoch': 0.05} [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:17,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:04:17,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:18,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:04:19,361 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:20,309 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:04:20,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:21,811 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:04:22,356 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:23,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:04:23,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:24,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:04:25,302 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:26,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:04:26,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▊ | 24/2230 [05:11<7:24:55, 12.10s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:04:27,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:04:28,354 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▊ | 24/2230 [05:11<7:24:55, 12.10s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:04:27,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:30,744 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:04:27,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:30,744 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:04:27,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:33,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:04:27,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:36,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:04:27,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 25/2230 [05:23<7:25:04, 12.11s/it] Setting `use_cache=False`...1] 2022-03-23 17:04:27,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 25/2230 [05:23<7:25:04, 12.11s/it] Setting `use_cache=False`...1] 2022-03-23 17:04:27,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 25/2230 [05:23<7:25:04, 12.11s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:04:39,997 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:42,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:04:39,997 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:42,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:04:39,997 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:45,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:04:39,997 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:04:39,997 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 26/2230 [05:35<7:20:31, 11.99s/it] Setting `use_cache=False`...1] 2022-03-23 17:04:39,997 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 26/2230 [05:35<7:20:31, 11.99s/it] Setting `use_cache=False`...1] 2022-03-23 17:04:39,997 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 26/2230 [05:35<7:20:31, 11.99s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:04:51,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:54,567 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:04:51,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:54,567 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:04:51,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:04:57,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:04:51,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:05:00,320 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:04:51,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:05:00,320 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:04:51,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 27/2230 [05:46<7:16:29, 11.89s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:05:03,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 27/2230 [05:46<7:16:29, 11.89s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:05:03,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:05:06,118 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:03,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:05:09,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:03,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:05:09,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:03,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:05:11,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:03,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:05:11,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:03,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 28/2230 [05:58<7:11:52, 11.77s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:05:14,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:05:17,614 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:14,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:05:17,614 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:14,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:05:20,434 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:14,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:05:23,241 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:14,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:05:23,241 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:14,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [06:09<7:07:33, 11.66s/it] Setting `use_cache=False`...1] 2022-03-23 17:05:14,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 29/2230 [06:09<7:07:33, 11.66s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:05:26,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:05:28,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:26,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:05:31,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:26,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:05:34,498 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:26,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 30/2230 [06:20<7:02:54, 11.53s/it] Setting `use_cache=False`...1] 2022-03-23 17:05:26,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 30/2230 [06:20<7:02:54, 11.53s/it] Setting `use_cache=False`...1] 2022-03-23 17:05:26,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 30/2230 [06:20<7:02:54, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:05:37,388 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:05:40,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:37,388 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:05:40,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:37,388 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:05:42,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:37,388 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:05:45,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:37,388 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:05:45,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:37,388 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 31/2230 [06:32<6:58:46, 11.43s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:05:48,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 31/2230 [06:32<6:58:46, 11.43s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:05:48,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:05:51,281 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:48,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:05:53,984 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:48,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:05:56,716 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:48,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█▏ | 32/2230 [06:43<6:53:42, 11.29s/it] Setting `use_cache=False`...1] 2022-03-23 17:05:48,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█▏ | 32/2230 [06:43<6:53:42, 11.29s/it] Setting `use_cache=False`...1] 2022-03-23 17:05:48,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█▏ | 32/2230 [06:43<6:53:42, 11.29s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:05:59,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:02,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:59,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:04,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:59,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:04,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:59,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:07,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:59,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:07,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:05:59,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█▏ | 33/2230 [06:54<6:49:53, 11.19s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:06:10,472 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:13,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:06:10,472 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:13,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:06:10,472 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:15,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:06:10,472 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:18,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:06:10,472 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:18,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:06:10,472 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 34/2230 [07:04<6:41:57, 10.98s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:06:20,943 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:23,546 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:06:20,943 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:23,546 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:06:20,943 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:26,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:06:20,943 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:28,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:06:20,943 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:28,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:06:20,943 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 35/2230 [07:14<6:35:38, 10.81s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:06:31,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 35/2230 [07:14<6:35:38, 10.81s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:06:31,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:33,899 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:06:31,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:36,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:06:31,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:38,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:06:31,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▎ | 36/2230 [07:25<6:29:33, 10.65s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:06:41,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▎ | 36/2230 [07:25<6:29:33, 10.65s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:06:41,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8945, 'learning_rate': 2.04e-05, 'epoch': 0.08} [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:44,108 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:06:41,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:46,582 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:06:41,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:49,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:06:41,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:49,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:06:41,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▎ | 37/2230 [07:35<6:21:55, 10.45s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:06:51,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:53,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:06:51,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:53,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:06:51,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:56,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:06:51,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:06:59,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:06:51,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▎ | 38/2230 [07:45<6:18:32, 10.36s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:07:01,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▎ | 38/2230 [07:45<6:18:32, 10.36s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:07:01,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:07:04,074 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:01,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:07:06,397 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:01,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:07:08,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:01,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▍ | 39/2230 [07:54<6:07:58, 10.08s/it] Setting `use_cache=False`...1] 2022-03-23 17:07:01,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▍ | 39/2230 [07:54<6:07:58, 10.08s/it] Setting `use_cache=False`...1] 2022-03-23 17:07:01,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▍ | 39/2230 [07:54<6:07:58, 10.08s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:07:11,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:07:13,336 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:11,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:07:15,593 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:11,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:07:17,826 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:11,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:07:17,826 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:11,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▍ | 40/2230 [08:03<5:56:57, 9.78s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:07:20,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:07:22,310 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:20,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:07:24,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:20,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:07:26,501 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:20,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▍ | 41/2230 [08:12<5:43:31, 9.42s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:07:28,649 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▍ | 41/2230 [08:12<5:43:31, 9.42s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:07:28,649 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:07:30,666 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:28,649 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:07:32,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:28,649 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:07:34,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:28,649 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▍ | 42/2230 [08:20<5:28:53, 9.02s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:07:36,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▍ | 42/2230 [08:20<5:28:53, 9.02s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:07:36,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:07:38,602 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:36,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:07:40,432 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:36,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:07:42,247 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:36,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:07:42,247 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:36,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▌ | 43/2230 [08:28<5:11:51, 8.56s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:07:44,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:07:45,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:44,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:07:49,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:44,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▌ | 44/2230 [08:34<4:52:21, 8.02s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:07:50,851 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▌ | 44/2230 [08:34<4:52:21, 8.02s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:07:50,851 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:07:52,422 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:50,851 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:07:55,467 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:50,851 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▌ | 45/2230 [08:41<4:32:24, 7.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:07:57,009 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▌ | 45/2230 [08:41<4:32:24, 7.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:07:57,009 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:07:59,707 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:57,009 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:08:00,993 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:57,009 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:08:00,993 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:07:57,009 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▋ | 46/2230 [08:46<4:09:31, 6.86s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:08:02,337 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:08:04,714 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:08:02,337 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▋ | 47/2230 [08:51<3:46:49, 6.23s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:08:07,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▋ | 47/2230 [08:51<3:46:49, 6.23s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:08:07,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:08:09,149 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:08:07,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▋ | 48/2230 [08:55<3:24:16, 5.62s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:08:11,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▋ | 48/2230 [08:55<3:24:16, 5.62s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:08:11,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:08:12,986 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:08:11,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▋ | 49/2230 [08:59<3:02:34, 5.02s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:08:14,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▋ | 49/2230 [08:59<3:02:34, 5.02s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:08:14,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:08:17,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:08:14,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▊ | 50/2230 [09:02<2:46:50, 4.59s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:08:19,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▊ | 50/2230 [09:02<2:46:50, 4.59s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:08:19,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6526, 'learning_rate': 2.88e-05, 'epoch': 0.11} [WARNING|modeling_bart.py:1051] 2022-03-23 17:08:23,030 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:08:19,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:08:23,030 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:08:19,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:08:26,626 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:08:19,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:08:26,626 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:08:19,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:08:30,158 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:08:19,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▊ | 51/2230 [09:17<4:34:27, 7.56s/it] Setting `use_cache=False`...1] 2022-03-23 17:08:19,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▊ | 51/2230 [09:17<4:34:27, 7.56s/it] Setting `use_cache=False`...1] 2022-03-23 17:08:19,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▊ | 51/2230 [09:17<4:34:27, 7.56s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:08:33,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:08:37,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:08:33,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:08:37,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:08:33,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:08:40,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:08:33,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:08:40,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:08:33,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:08:44,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:08:33,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▊ | 52/2230 [09:31<5:45:14, 9.51s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:08:47,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▊ | 52/2230 [09:31<5:45:14, 9.51s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:08:47,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.7033, 'learning_rate': 2.9999999999999997e-05, 'epoch': 0.12} [WARNING|modeling_bart.py:1051] 2022-03-23 17:08:51,227 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:08:47,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:08:51,227 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:08:47,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:08:54,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:08:47,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:08:58,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:08:47,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 53/2230 [09:44<6:31:07, 10.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:08:47,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 53/2230 [09:44<6:31:07, 10.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:08:47,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 53/2230 [09:44<6:31:07, 10.78s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:09:01,506 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 53/2230 [09:44<6:31:07, 10.78s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:09:01,506 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:09:04,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:09:01,506 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:09:08,216 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:09:01,506 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:09:08,216 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:09:01,506 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:09:11,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:09:01,506 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 54/2230 [09:58<7:00:50, 11.60s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:01,506 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 54/2230 [09:58<7:00:50, 11.60s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:01,506 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 54/2230 [09:58<7:00:50, 11.60s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:09:15,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:09:18,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:09:15,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:09:18,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:09:15,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:09:21,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:09:15,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:09:21,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:09:15,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:09:25,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:09:15,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 55/2230 [10:11<7:20:54, 12.16s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 55/2230 [10:11<7:20:54, 12.16s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7961, 'learning_rate': 3.1799999999999994e-05, 'epoch': 0.12} [WARNING|modeling_bart.py:1051] 2022-03-23 17:09:31,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:09:31,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:09:31,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:09:31,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9426, 'learning_rate': 3.2399999999999995e-05, 'epoch': 0.13} 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9681, 'learning_rate': 3.2999999999999996e-05, 'epoch': 0.13} 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9621, 'learning_rate': 3.36e-05, 'epoch': 0.13} 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9065, 'learning_rate': 3.42e-05, 'epoch': 0.13} 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7891, 'learning_rate': 3.48e-05, 'epoch': 0.13} 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 56/2230 [10:25<7:33:08, 12.51s/it] Setting `use_cache=False`...1] 2022-03-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:10:42,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:10:42,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:10:42,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:10:42,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8069, 'learning_rate': 3.539999999999999e-05, 'epoch': 0.14} [WARNING|modeling_utils.py:388] 2022-03-23 17:10:42,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:10:42,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:10:42,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:10:42,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8273, 'learning_rate': 3.5999999999999994e-05, 'epoch': 0.14} 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7276, 'learning_rate': 3.6599999999999995e-05, 'epoch': 0.14} 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7894, 'learning_rate': 3.7199999999999996e-05, 'epoch': 0.14} 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7477, 'learning_rate': 3.78e-05, 'epoch': 0.15} 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 62/2230 [11:43<7:44:51, 12.86s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 66/2230 [12:33<7:36:05, 12.65s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 66/2230 [12:33<7:36:05, 12.65s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6233, 'learning_rate': 3.84e-05, 'epoch': 0.15} 3%|██▎ | 66/2230 [12:33<7:36:05, 12.65s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 66/2230 [12:33<7:36:05, 12.65s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 66/2230 [12:33<7:36:05, 12.65s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 66/2230 [12:33<7:36:05, 12.65s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 67/2230 [12:46<7:32:04, 12.54s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 67/2230 [12:46<7:32:04, 12.54s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7465, 'learning_rate': 3.9e-05, 'epoch': 0.15} 3%|██▎ | 67/2230 [12:46<7:32:04, 12.54s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 67/2230 [12:46<7:32:04, 12.54s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 67/2230 [12:46<7:32:04, 12.54s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 67/2230 [12:46<7:32:04, 12.54s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [12:58<7:29:07, 12.46s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [12:58<7:29:07, 12.46s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7111, 'learning_rate': 3.96e-05, 'epoch': 0.15} 3%|██▍ | 68/2230 [12:58<7:29:07, 12.46s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [12:58<7:29:07, 12.46s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [12:58<7:29:07, 12.46s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [12:58<7:29:07, 12.46s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▍ | 68/2230 [12:58<7:29:07, 12.46s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6527, 'learning_rate': 4.08e-05, 'epoch': 0.16} g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7916, 'learning_rate': 4.14e-05, 'epoch': 0.16} g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8087, 'learning_rate': 4.2e-05, 'epoch': 0.16} g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6922, 'learning_rate': 4.259999999999999e-05, 'epoch': 0.16} g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7028, 'learning_rate': 4.319999999999999e-05, 'epoch': 0.17} g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▋ | 75/2230 [14:22<7:10:35, 11.99s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▋ | 75/2230 [14:22<7:10:35, 11.99s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7345, 'learning_rate': 4.3799999999999994e-05, 'epoch': 0.17} 3%|██▋ | 75/2230 [14:22<7:10:35, 11.99s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▋ | 75/2230 [14:22<7:10:35, 11.99s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▋ | 75/2230 [14:22<7:10:35, 11.99s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▋ | 75/2230 [14:22<7:10:35, 11.99s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▋ | 75/2230 [14:22<7:10:35, 11.99s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▋ | 75/2230 [14:22<7:10:35, 11.99s/it]g-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:13:52,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:13:52,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:13:52,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:13:52,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:13:52,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:13:52,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6925, 'learning_rate': 4.4999999999999996e-05, 'epoch': 0.17} [WARNING|modeling_utils.py:388] 2022-03-23 17:13:52,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:13:52,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:13:52,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:14:11,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:14:11,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:14:11,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6708, 'learning_rate': 4.56e-05, 'epoch': 0.17} [WARNING|modeling_utils.py:388] 2022-03-23 17:14:11,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:14:11,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:14:11,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:14:11,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:14:11,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:14:11,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6112, 'learning_rate': 4.62e-05, 'epoch': 0.18} [WARNING|modeling_utils.py:388] 2022-03-23 17:14:11,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:14:11,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:14:11,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:14:11,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:14:11,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7209, 'learning_rate': 4.68e-05, 'epoch': 0.18} [WARNING|modeling_utils.py:388] 2022-03-23 17:14:11,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:14:11,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:14:11,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:14:11,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:14:11,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6872, 'learning_rate': 4.7399999999999993e-05, 'epoch': 0.18} [WARNING|modeling_utils.py:388] 2022-03-23 17:14:11,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:14:11,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:14:54,174 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:14:54,174 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:14:54,174 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6316, 'learning_rate': 4.7999999999999994e-05, 'epoch': 0.18} [WARNING|modeling_utils.py:388] 2022-03-23 17:14:54,174 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:15:02,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:15:02,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:15:02,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:15:02,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7507, 'learning_rate': 4.8599999999999995e-05, 'epoch': 0.19} [WARNING|modeling_utils.py:388] 2022-03-23 17:15:02,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:15:02,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:15:02,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:15:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:15:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6468, 'learning_rate': 4.9199999999999997e-05, 'epoch': 0.19} [WARNING|modeling_utils.py:388] 2022-03-23 17:15:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:15:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:15:16,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:15:26,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:15:26,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6845, 'learning_rate': 4.98e-05, 'epoch': 0.19} [WARNING|modeling_utils.py:388] 2022-03-23 17:15:26,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:15:26,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:15:35,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:15:35,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:15:35,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:09:28,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 4%|███ | 86/2230 [16:22<6:13:57, 10.47s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 4%|███ | 86/2230 [16:22<6:13:57, 10.47s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 4%|███ | 86/2230 [16:22<6:13:57, 10.47s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:15:45,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:15:45,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:15:45,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6043, 'learning_rate': 5.1e-05, 'epoch': 0.2} [WARNING|modeling_bart.py:1051] 2022-03-23 17:15:51,138 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:15:51,138 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:15:51,138 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:15:57,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:15:57,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5494, 'learning_rate': 5.1599999999999994e-05, 'epoch': 0.2} [WARNING|modeling_utils.py:388] 2022-03-23 17:16:01,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:16:01,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:05,518 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:05,518 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 4%|███▏ | 89/2230 [16:51<5:52:14, 9.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:16:09,368 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:16:11,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:16:13,723 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:16:15,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:16:15,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7288, 'learning_rate': 5.279999999999999e-05, 'epoch': 0.2} [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:19,750 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:21,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:23,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:23,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:25,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:27,811 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:29,675 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:31,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:31,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:33,459 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:35,249 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:37,023 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:37,023 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:40,487 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:42,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:43,666 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:45,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:45,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:48,213 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:49,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:51,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:51,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:53,831 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:56,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:56,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:57,650 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:16:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:02,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:04,158 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:04,158 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:06,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:07,894 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:07,894 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:10,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:11,849 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:11,849 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3868, 'learning_rate': 5.88e-05, 'epoch': 0.22} [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:15,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:15,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:19,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:22,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:22,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:26,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:26,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.9542, 'learning_rate': 5.88e-05, 'epoch': 0.23} [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:29,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:33,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:33,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:36,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:36,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:39,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:39,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:43,279 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:43,279 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:46,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:46,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:50,050 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:53,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:53,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.5644, 'learning_rate': 5.9999999999999995e-05, 'epoch': 0.23} [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:56,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:17:56,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:00,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:03,467 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:03,467 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:03,467 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:06,809 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:06,809 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:10,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9764, 'learning_rate': 6.12e-05, 'epoch': 0.24} [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8396, 'learning_rate': 6.18e-05, 'epoch': 0.24} [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7774, 'learning_rate': 6.239999999999999e-05, 'epoch': 0.24} [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7074, 'learning_rate': 6.299999999999999e-05, 'epoch': 0.24} [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:18:13,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6497, 'learning_rate': 6.359999999999999e-05, 'epoch': 0.24} 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7453, 'learning_rate': 6.419999999999999e-05, 'epoch': 0.25} 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7379, 'learning_rate': 6.479999999999999e-05, 'epoch': 0.25} 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7307, 'learning_rate': 6.539999999999999e-05, 'epoch': 0.25} 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▊ | 109/2230 [19:57<7:34:03, 12.84s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6858, 'learning_rate': 6.599999999999999e-05, 'epoch': 0.25} 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5898, 'learning_rate': 6.659999999999999e-05, 'epoch': 0.26} 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.568, 'learning_rate': 6.72e-05, 'epoch': 0.26} 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.685, 'learning_rate': 6.78e-05, 'epoch': 0.26} 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5926, 'learning_rate': 6.84e-05, 'epoch': 0.26} 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.691, 'learning_rate': 6.9e-05, 'epoch': 0.26} 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6207, 'learning_rate': 6.96e-05, 'epoch': 0.27} 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6662, 'learning_rate': 7.02e-05, 'epoch': 0.27} 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4633, 'learning_rate': 7.079999999999999e-05, 'epoch': 0.27} 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5675, 'learning_rate': 7.139999999999999e-05, 'epoch': 0.27} 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 113/2230 [20:49<7:34:09, 12.87s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:22:05,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:22:05,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:22:05,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5613, 'learning_rate': 7.199999999999999e-05, 'epoch': 0.28} [WARNING|modeling_utils.py:388] 2022-03-23 17:22:05,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:22:05,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:22:05,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:22:05,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:22:05,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:22:05,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5392, 'learning_rate': 7.259999999999999e-05, 'epoch': 0.28} [WARNING|modeling_utils.py:388] 2022-03-23 17:22:05,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:22:05,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:22:05,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:22:05,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:22:05,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:22:05,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6394, 'learning_rate': 7.319999999999999e-05, 'epoch': 0.28} [WARNING|modeling_utils.py:388] 2022-03-23 17:22:05,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:22:05,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:22:05,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6477, 'learning_rate': 7.379999999999999e-05, 'epoch': 0.28} g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5763, 'learning_rate': 7.439999999999999e-05, 'epoch': 0.28} g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5374, 'learning_rate': 7.5e-05, 'epoch': 0.29} g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.469, 'learning_rate': 7.56e-05, 'epoch': 0.29} g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▌ | 130/2230 [24:12<6:37:01, 11.34s/it]g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▌ | 130/2230 [24:12<6:37:01, 11.34s/it]g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6234, 'learning_rate': 7.62e-05, 'epoch': 0.29} 6%|████▌ | 130/2230 [24:12<6:37:01, 11.34s/it]g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▌ | 130/2230 [24:12<6:37:01, 11.34s/it]g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▌ | 130/2230 [24:12<6:37:01, 11.34s/it]g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▌ | 130/2230 [24:12<6:37:01, 11.34s/it]g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▌ | 130/2230 [24:12<6:37:01, 11.34s/it]g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4815, 'learning_rate': 7.68e-05, 'epoch': 0.29} 6%|████▌ | 130/2230 [24:12<6:37:01, 11.34s/it]g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▌ | 130/2230 [24:12<6:37:01, 11.34s/it]g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▌ | 130/2230 [24:12<6:37:01, 11.34s/it]g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▌ | 130/2230 [24:12<6:37:01, 11.34s/it]g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▌ | 130/2230 [24:12<6:37:01, 11.34s/it]g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▌ | 132/2230 [24:33<6:28:20, 11.11s/it]g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:23:52,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:23:52,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:23:52,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:23:52,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:23:52,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▋ | 133/2230 [24:44<6:23:11, 10.96s/it]g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▋ | 133/2230 [24:44<6:23:11, 10.96s/it]g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▋ | 133/2230 [24:44<6:23:11, 10.96s/it]g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:24:06,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:24:06,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▋ | 134/2230 [24:54<6:16:48, 10.79s/it]g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▋ | 134/2230 [24:54<6:16:48, 10.79s/it]g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.512, 'learning_rate': 7.86e-05, 'epoch': 0.3} 6%|████▋ | 134/2230 [24:54<6:16:48, 10.79s/it]g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▋ | 134/2230 [24:54<6:16:48, 10.79s/it]g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:24:18,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▋ | 135/2230 [25:05<6:11:00, 10.63s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▋ | 135/2230 [25:05<6:11:00, 10.63s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6734, 'learning_rate': 7.92e-05, 'epoch': 0.3} 6%|████▋ | 135/2230 [25:05<6:11:00, 10.63s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▋ | 135/2230 [25:05<6:11:00, 10.63s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:24:28,916 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 136/2230 [25:15<6:04:58, 10.46s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 136/2230 [25:15<6:04:58, 10.46s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5342, 'learning_rate': 7.98e-05, 'epoch': 0.3} 6%|████▊ | 136/2230 [25:15<6:04:58, 10.46s/it] Setting `use_cache=False`...e computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:24:36,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:24:36,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 137/2230 [25:24<5:57:44, 10.26s/it]g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 137/2230 [25:24<5:57:44, 10.26s/it]g-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:24:42,902 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:24:42,902 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:24:42,902 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:24:42,902 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:24:42,902 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:15:39,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 138/2230 [25:34<5:54:37, 10.17s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:24:51,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 138/2230 [25:34<5:54:37, 10.17s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:24:51,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:24:55,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:24:51,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:24:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:24:51,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:24:59,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:24:51,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:24:59,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:24:51,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.637, 'learning_rate': 8.16e-05, 'epoch': 0.31} [WARNING|modeling_utils.py:388] 2022-03-23 17:24:59,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:24:51,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:25:05,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:24:51,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:25:07,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:24:51,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:25:07,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:24:51,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6462, 'learning_rate': 8.22e-05, 'epoch': 0.31} [WARNING|modeling_bart.py:1051] 2022-03-23 17:25:11,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:24:51,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:25:13,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:24:51,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:25:15,373 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:24:51,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▉ | 141/2230 [26:01<5:20:02, 9.19s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:25:17,459 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▉ | 141/2230 [26:01<5:20:02, 9.19s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:25:17,459 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:25:19,390 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:25:17,459 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:25:21,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:25:17,459 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:25:23,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:25:17,459 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:25:23,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:25:17,459 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▉ | 142/2230 [26:08<5:04:01, 8.74s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:25:25,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:25:26,831 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:25:25,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:25:28,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:25:25,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|█████ | 143/2230 [26:16<4:46:03, 8.22s/it] Setting `use_cache=False`...1] 2022-03-23 17:25:25,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|█████ | 143/2230 [26:16<4:46:03, 8.22s/it] Setting `use_cache=False`...1] 2022-03-23 17:25:25,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:25:33,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:25:32,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:25:35,276 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:25:32,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|█████ | 144/2230 [26:22<4:27:32, 7.70s/it] Setting `use_cache=False`...1] 2022-03-23 17:25:32,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|█████ | 144/2230 [26:22<4:27:32, 7.70s/it] Setting `use_cache=False`...1] 2022-03-23 17:25:32,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:25:39,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:25:38,478 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:25:41,468 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:25:38,478 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████ | 145/2230 [26:28<4:09:49, 7.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:25:38,478 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████ | 145/2230 [26:28<4:09:49, 7.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:25:38,478 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:25:45,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:25:44,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:25:48,281 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:25:44,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:25:48,281 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:25:44,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████ | 146/2230 [26:33<3:49:14, 6.60s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:25:49,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:25:51,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:25:49,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▏ | 147/2230 [26:38<3:28:23, 6.00s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:25:54,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▏ | 147/2230 [26:38<3:28:23, 6.00s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:25:54,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:25:56,189 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:25:54,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▏ | 148/2230 [26:42<3:08:07, 5.42s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:25:58,171 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▏ | 148/2230 [26:42<3:08:07, 5.42s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:25:58,171 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:25:59,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:25:58,171 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:02,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:01,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:02,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:01,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:03,899 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:01,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:03,899 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:01,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▏ | 150/2230 [26:49<2:33:45, 4.44s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:26:06,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▏ | 150/2230 [26:49<2:33:45, 4.44s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:26:06,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:09,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:06,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:09,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:06,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:13,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:06,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:13,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:06,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:16,789 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:06,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:16,789 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:06,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▎ | 151/2230 [27:03<4:16:28, 7.40s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:26:20,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▎ | 151/2230 [27:03<4:16:28, 7.40s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:26:20,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:23,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:20,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:23,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:20,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:27,210 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:20,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:30,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:20,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:30,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:20,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:30,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:20,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▎ | 152/2230 [27:17<5:22:49, 9.32s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:26:34,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▎ | 152/2230 [27:17<5:22:49, 9.32s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:26:34,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:37,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:34,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:40,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:34,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:40,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:34,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:44,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:34,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▎ | 153/2230 [27:31<6:07:05, 10.60s/it] Setting `use_cache=False`...1] 2022-03-23 17:26:34,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▎ | 153/2230 [27:31<6:07:05, 10.60s/it] Setting `use_cache=False`...1] 2022-03-23 17:26:34,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▎ | 153/2230 [27:31<6:07:05, 10.60s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:26:47,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:51,014 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:47,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:51,014 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:47,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:54,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:47,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:54,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:47,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:57,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:47,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:26:57,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:26:47,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 154/2230 [27:44<6:36:23, 11.46s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▍ | 154/2230 [27:44<6:36:23, 11.46s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:04,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:07,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:07,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:07,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:07,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:07,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8415, 'learning_rate': 9.12e-05, 'epoch': 0.35} [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:07,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:07,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:07,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:07,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:07,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:07,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:07,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6594, 'learning_rate': 9.18e-05, 'epoch': 0.35} [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:07,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:07,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:07,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:07,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:07,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:07,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6, 'learning_rate': 9.24e-05, 'epoch': 0.35} [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:07,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:07,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:07,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:07,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:27:07,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 158/2230 [28:37<7:20:02, 12.74s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 158/2230 [28:37<7:20:02, 12.74s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 158/2230 [28:37<7:20:02, 12.74s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 158/2230 [28:37<7:20:02, 12.74s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 158/2230 [28:37<7:20:02, 12.74s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 158/2230 [28:37<7:20:02, 12.74s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 158/2230 [28:37<7:20:02, 12.74s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 158/2230 [28:37<7:20:02, 12.74s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 158/2230 [28:37<7:20:02, 12.74s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6704, 'learning_rate': 9.36e-05, 'epoch': 0.36} 7%|█████▌ | 158/2230 [28:37<7:20:02, 12.74s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 158/2230 [28:37<7:20:02, 12.74s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 158/2230 [28:37<7:20:02, 12.74s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 158/2230 [28:37<7:20:02, 12.74s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 158/2230 [28:37<7:20:02, 12.74s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7041, 'learning_rate': 9.419999999999999e-05, 'epoch': 0.36} 7%|█████▌ | 158/2230 [28:37<7:20:02, 12.74s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 158/2230 [28:37<7:20:02, 12.74s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 158/2230 [28:37<7:20:02, 12.74s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 158/2230 [28:37<7:20:02, 12.74s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 158/2230 [28:37<7:20:02, 12.74s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 161/2230 [29:16<7:22:35, 12.83s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 161/2230 [29:16<7:22:35, 12.83s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7007, 'learning_rate': 9.479999999999999e-05, 'epoch': 0.36} 7%|█████▋ | 161/2230 [29:16<7:22:35, 12.83s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 161/2230 [29:16<7:22:35, 12.83s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 161/2230 [29:16<7:22:35, 12.83s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 161/2230 [29:16<7:22:35, 12.83s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6682, 'learning_rate': 9.539999999999999e-05, 'epoch': 0.36} 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6197, 'learning_rate': 9.599999999999999e-05, 'epoch': 0.37} 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.62, 'learning_rate': 9.659999999999999e-05, 'epoch': 0.37} 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4891, 'learning_rate': 9.719999999999999e-05, 'epoch': 0.37} 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5451, 'learning_rate': 9.779999999999999e-05, 'epoch': 0.37} 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4452, 'learning_rate': 9.839999999999999e-05, 'epoch': 0.37} 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5057, 'learning_rate': 9.9e-05, 'epoch': 0.38} 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5834, 'learning_rate': 9.96e-05, 'epoch': 0.38} 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5699, 'learning_rate': 0.0001002, 'epoch': 0.38} 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5678, 'learning_rate': 0.0001008, 'epoch': 0.38} 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▋ | 162/2230 [29:28<7:20:33, 12.78s/it] Setting `use_cache=False`...1] 2022-03-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6897, 'learning_rate': 0.0001014, 'epoch': 0.39} [WARNING|modeling_utils.py:388] 2022-03-23 17:30:53,347 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:30:53,347 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:30:53,347 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:30:53,347 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 173/2230 [31:43<6:53:01, 12.05s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 173/2230 [31:43<6:53:01, 12.05s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 173/2230 [31:43<6:53:01, 12.05s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 173/2230 [31:43<6:53:01, 12.05s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 173/2230 [31:43<6:53:01, 12.05s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 173/2230 [31:43<6:53:01, 12.05s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████ | 173/2230 [31:43<6:53:01, 12.05s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6041, 'learning_rate': 0.0001026, 'epoch': 0.39} [WARNING|modeling_utils.py:388] 2022-03-23 17:31:13,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:31:13,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:31:13,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:31:13,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:31:13,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:31:13,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6913, 'learning_rate': 0.00010319999999999999, 'epoch': 0.39} [WARNING|modeling_utils.py:388] 2022-03-23 17:31:13,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:31:13,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:31:13,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:31:13,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▏ | 176/2230 [32:19<6:45:02, 11.83s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▏ | 176/2230 [32:19<6:45:02, 11.83s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4995, 'learning_rate': 0.00010379999999999999, 'epoch': 0.39} 8%|██████▏ | 176/2230 [32:19<6:45:02, 11.83s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▏ | 176/2230 [32:19<6:45:02, 11.83s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▏ | 176/2230 [32:19<6:45:02, 11.83s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▏ | 176/2230 [32:19<6:45:02, 11.83s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▏ | 176/2230 [32:19<6:45:02, 11.83s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4945, 'learning_rate': 0.00010439999999999999, 'epoch': 0.4} 8%|██████▏ | 176/2230 [32:19<6:45:02, 11.83s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▏ | 176/2230 [32:19<6:45:02, 11.83s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▏ | 176/2230 [32:19<6:45:02, 11.83s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▏ | 176/2230 [32:19<6:45:02, 11.83s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▏ | 176/2230 [32:19<6:45:02, 11.83s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▏ | 176/2230 [32:19<6:45:02, 11.83s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5747, 'learning_rate': 0.00010499999999999999, 'epoch': 0.4} 8%|██████▏ | 176/2230 [32:19<6:45:02, 11.83s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▏ | 176/2230 [32:19<6:45:02, 11.83s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▏ | 176/2230 [32:19<6:45:02, 11.83s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▏ | 176/2230 [32:19<6:45:02, 11.83s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▎ | 179/2230 [32:52<6:32:39, 11.49s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▎ | 179/2230 [32:52<6:32:39, 11.49s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.631, 'learning_rate': 0.00010559999999999998, 'epoch': 0.4} 8%|██████▎ | 179/2230 [32:52<6:32:39, 11.49s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▎ | 179/2230 [32:52<6:32:39, 11.49s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▎ | 179/2230 [32:52<6:32:39, 11.49s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:32:19,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:32:19,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6599, 'learning_rate': 0.00010619999999999998, 'epoch': 0.4} [WARNING|modeling_utils.py:388] 2022-03-23 17:32:19,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:32:19,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:32:19,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:32:19,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:32:19,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5648, 'learning_rate': 0.00010679999999999998, 'epoch': 0.41} [WARNING|modeling_utils.py:388] 2022-03-23 17:32:19,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:32:19,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:32:19,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:32:19,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▎ | 182/2230 [33:25<6:20:11, 11.14s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▎ | 182/2230 [33:25<6:20:11, 11.14s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5608, 'learning_rate': 0.00010739999999999998, 'epoch': 0.41} 8%|██████▎ | 182/2230 [33:25<6:20:11, 11.14s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▎ | 182/2230 [33:25<6:20:11, 11.14s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▎ | 182/2230 [33:25<6:20:11, 11.14s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▎ | 182/2230 [33:25<6:20:11, 11.14s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▎ | 182/2230 [33:25<6:20:11, 11.14s/it]g-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:32:53,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:32:53,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:32:53,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:32:53,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:32:53,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:32:53,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:33:03,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:33:03,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:33:03,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:33:03,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:33:03,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:33:03,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:33:14,165 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:33:14,165 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:33:14,165 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:33:20,354 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:33:20,354 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:33:20,354 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4157, 'learning_rate': 0.00010979999999999999, 'epoch': 0.42} [WARNING|modeling_utils.py:388] 2022-03-23 17:33:20,354 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:33:28,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:33:28,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:33:28,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:33:28,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4919, 'learning_rate': 0.00011039999999999999, 'epoch': 0.42} [WARNING|modeling_utils.py:388] 2022-03-23 17:33:36,167 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:33:36,167 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:33:36,167 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:33:42,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:33:42,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4482, 'learning_rate': 0.00011099999999999999, 'epoch': 0.42} [WARNING|modeling_bart.py:1051] 2022-03-23 17:33:46,632 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:33:46,632 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:33:50,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:33:50,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:33:50,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:33:54,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:33:56,768 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:33:58,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:27:01,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▋ | 190/2230 [34:44<5:23:58, 9.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:34:01,153 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▋ | 190/2230 [34:44<5:23:58, 9.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:34:01,153 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:03,246 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:01,153 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:05,286 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:01,153 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:07,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:01,153 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:07,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:01,153 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▋ | 191/2230 [34:53<5:11:25, 9.16s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:34:09,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:11,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:09,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:13,284 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:09,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:15,181 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:09,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:15,181 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:09,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▋ | 192/2230 [35:01<4:57:31, 8.76s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:34:17,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:19,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:17,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:20,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:17,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:22,598 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:17,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:22,598 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:17,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▊ | 193/2230 [35:08<4:42:52, 8.33s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:34:24,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:26,158 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:24,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:29,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:24,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:29,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:24,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▊ | 194/2230 [35:15<4:27:01, 7.87s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:34:31,214 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:32,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:31,214 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:35,741 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:31,214 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:35,741 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:31,214 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▊ | 195/2230 [35:21<4:09:30, 7.36s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:34:37,316 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:39,997 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:37,316 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:41,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:37,316 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:41,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:37,316 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:43,831 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:42,625 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:46,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:42,625 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:46,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:42,625 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▉ | 197/2230 [35:31<3:28:32, 6.15s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:34:47,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:49,320 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:47,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:49,320 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:47,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:52,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:51,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:53,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:51,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:53,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:51,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:34:55,720 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:54,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▉ | 200/2230 [35:42<2:33:11, 4.53s/it] Setting `use_cache=False`...1] 2022-03-23 17:34:54,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▉ | 200/2230 [35:42<2:33:11, 4.53s/it] Setting `use_cache=False`...1] 2022-03-23 17:34:54,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▉ | 200/2230 [35:42<2:33:11, 4.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:34:59,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▉ | 200/2230 [35:42<2:33:11, 4.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:34:59,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:03,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:59,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:06,565 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:59,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:06,565 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:59,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:10,081 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:34:59,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 201/2230 [35:56<4:12:26, 7.46s/it] Setting `use_cache=False`...1] 2022-03-23 17:34:59,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 201/2230 [35:56<4:12:26, 7.46s/it] Setting `use_cache=False`...1] 2022-03-23 17:34:59,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 201/2230 [35:56<4:12:26, 7.46s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:35:13,631 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 201/2230 [35:56<4:12:26, 7.46s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:35:13,631 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:17,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:13,631 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:20,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:13,631 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:20,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:13,631 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:23,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:13,631 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:23,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:13,631 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 202/2230 [36:10<5:15:53, 9.35s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:13,631 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 202/2230 [36:10<5:15:53, 9.35s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:35:27,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:30,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:27,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:30,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:27,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:34,056 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:27,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:34,056 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:27,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:37,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:27,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:37,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:27,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 203/2230 [36:24<5:57:37, 10.59s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:35:40,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 203/2230 [36:24<5:57:37, 10.59s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:35:40,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:44,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:40,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:44,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:40,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:47,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:40,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:50,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:40,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:50,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:40,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-23 17:35:40,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▏ | 204/2230 [36:37<6:25:08, 11.41s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8992, 'learning_rate': 0.00012119999999999999, 'epoch': 0.46} [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.811, 'learning_rate': 0.00012179999999999999, 'epoch': 0.46} [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6223, 'learning_rate': 0.0001224, 'epoch': 0.46} [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7224, 'learning_rate': 0.00012299999999999998, 'epoch': 0.47} [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:35:57,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7679, 'learning_rate': 0.0001236, 'epoch': 0.47} 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6853, 'learning_rate': 0.00012419999999999998, 'epoch': 0.47} 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5118, 'learning_rate': 0.00012479999999999997, 'epoch': 0.47} 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6325, 'learning_rate': 0.00012539999999999999, 'epoch': 0.48} 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 209/2230 [37:43<7:11:37, 12.81s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 213/2230 [38:34<7:10:20, 12.80s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 213/2230 [38:34<7:10:20, 12.80s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4748, 'learning_rate': 0.00012599999999999997, 'epoch': 0.48} 10%|███████▍ | 213/2230 [38:34<7:10:20, 12.80s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 213/2230 [38:34<7:10:20, 12.80s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 213/2230 [38:34<7:10:20, 12.80s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 213/2230 [38:34<7:10:20, 12.80s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 214/2230 [38:46<7:07:44, 12.73s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 214/2230 [38:46<7:07:44, 12.73s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.513, 'learning_rate': 0.0001266, 'epoch': 0.48} 10%|███████▍ | 214/2230 [38:46<7:07:44, 12.73s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 214/2230 [38:46<7:07:44, 12.73s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 214/2230 [38:46<7:07:44, 12.73s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 214/2230 [38:46<7:07:44, 12.73s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5576, 'learning_rate': 0.00012719999999999997, 'epoch': 0.48} Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.555, 'learning_rate': 0.0001278, 'epoch': 0.48} Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 217/2230 [39:23<6:57:59, 12.46s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 217/2230 [39:23<6:57:59, 12.46s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5305, 'learning_rate': 0.00012839999999999998, 'epoch': 0.49} 10%|███████▌ | 217/2230 [39:23<6:57:59, 12.46s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 217/2230 [39:23<6:57:59, 12.46s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 217/2230 [39:23<6:57:59, 12.46s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 217/2230 [39:23<6:57:59, 12.46s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 218/2230 [39:36<6:54:57, 12.37s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 218/2230 [39:36<6:54:57, 12.37s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5359, 'learning_rate': 0.000129, 'epoch': 0.49} 10%|███████▋ | 218/2230 [39:36<6:54:57, 12.37s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 218/2230 [39:36<6:54:57, 12.37s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 218/2230 [39:36<6:54:57, 12.37s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 218/2230 [39:36<6:54:57, 12.37s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 218/2230 [39:36<6:54:57, 12.37s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 218/2230 [39:36<6:54:57, 12.37s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4294, 'learning_rate': 0.00012959999999999998, 'epoch': 0.49} 10%|███████▋ | 218/2230 [39:36<6:54:57, 12.37s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 218/2230 [39:36<6:54:57, 12.37s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 218/2230 [39:36<6:54:57, 12.37s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 218/2230 [39:36<6:54:57, 12.37s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 220/2230 [40:00<6:50:33, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 220/2230 [40:00<6:50:33, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5847, 'learning_rate': 0.0001302, 'epoch': 0.49} 10%|███████▋ | 220/2230 [40:00<6:50:33, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 220/2230 [40:00<6:50:33, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 220/2230 [40:00<6:50:33, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 220/2230 [40:00<6:50:33, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 220/2230 [40:00<6:50:33, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 220/2230 [40:00<6:50:33, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4666, 'learning_rate': 0.00013079999999999998, 'epoch': 0.5} 10%|███████▋ | 220/2230 [40:00<6:50:33, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 220/2230 [40:00<6:50:33, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 220/2230 [40:00<6:50:33, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 220/2230 [40:00<6:50:33, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 220/2230 [40:00<6:50:33, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5346, 'learning_rate': 0.0001314, 'epoch': 0.5} 10%|███████▋ | 220/2230 [40:00<6:50:33, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 220/2230 [40:00<6:50:33, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 220/2230 [40:00<6:50:33, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 220/2230 [40:00<6:50:33, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 220/2230 [40:00<6:50:33, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 220/2230 [40:00<6:50:33, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4938, 'learning_rate': 0.00013199999999999998, 'epoch': 0.5} 10%|███████▋ | 220/2230 [40:00<6:50:33, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 220/2230 [40:00<6:50:33, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 220/2230 [40:00<6:50:33, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 220/2230 [40:00<6:50:33, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 224/2230 [40:47<6:38:34, 11.92s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 224/2230 [40:47<6:38:34, 11.92s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4626, 'learning_rate': 0.0001326, 'epoch': 0.5} 10%|███████▊ | 224/2230 [40:47<6:38:34, 11.92s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 224/2230 [40:47<6:38:34, 11.92s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 224/2230 [40:47<6:38:34, 11.92s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 224/2230 [40:47<6:38:34, 11.92s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 224/2230 [40:47<6:38:34, 11.92s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 224/2230 [40:47<6:38:34, 11.92s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.668, 'learning_rate': 0.00013319999999999999, 'epoch': 0.5} 10%|███████▊ | 224/2230 [40:47<6:38:34, 11.92s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 224/2230 [40:47<6:38:34, 11.92s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 224/2230 [40:47<6:38:34, 11.92s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 224/2230 [40:47<6:38:34, 11.92s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 224/2230 [40:47<6:38:34, 11.92s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 224/2230 [40:47<6:38:34, 11.92s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4876, 'learning_rate': 0.0001338, 'epoch': 0.51} 10%|███████▊ | 224/2230 [40:47<6:38:34, 11.92s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 224/2230 [40:47<6:38:34, 11.92s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 224/2230 [40:47<6:38:34, 11.92s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 227/2230 [41:22<6:31:46, 11.74s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 227/2230 [41:22<6:31:46, 11.74s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6419, 'learning_rate': 0.0001344, 'epoch': 0.51} 10%|███████▉ | 227/2230 [41:22<6:31:46, 11.74s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 227/2230 [41:22<6:31:46, 11.74s/it] Setting `use_cache=False`...1] 2022-03-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:40:46,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:40:46,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:40:46,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5957, 'learning_rate': 0.000135, 'epoch': 0.51} [WARNING|modeling_utils.py:388] 2022-03-23 17:40:46,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:40:46,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:40:46,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:40:46,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 229/2230 [41:45<6:24:16, 11.52s/it]g-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 229/2230 [41:45<6:24:16, 11.52s/it]g-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5052, 'learning_rate': 0.0001356, 'epoch': 0.51} 10%|████████ | 229/2230 [41:45<6:24:16, 11.52s/it]g-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 229/2230 [41:45<6:24:16, 11.52s/it]g-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 229/2230 [41:45<6:24:16, 11.52s/it]g-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 229/2230 [41:45<6:24:16, 11.52s/it]g-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 229/2230 [41:45<6:24:16, 11.52s/it]g-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5292, 'learning_rate': 0.0001362, 'epoch': 0.52} [WARNING|modeling_bart.py:1051] 2022-03-23 17:41:15,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:41:15,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:41:15,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:41:21,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:41:21,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:41:21,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:41:25,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:41:25,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:41:25,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:41:25,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 232/2230 [42:18<6:11:05, 11.14s/it]g-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 232/2230 [42:18<6:11:05, 11.14s/it]g-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4206, 'learning_rate': 0.0001374, 'epoch': 0.52} 10%|████████ | 232/2230 [42:18<6:11:05, 11.14s/it]g-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 232/2230 [42:18<6:11:05, 11.14s/it]g-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 232/2230 [42:18<6:11:05, 11.14s/it]g-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 232/2230 [42:18<6:11:05, 11.14s/it]g-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 232/2230 [42:18<6:11:05, 11.14s/it]g-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4587, 'learning_rate': 0.000138, 'epoch': 0.52} 10%|████████ | 232/2230 [42:18<6:11:05, 11.14s/it]g-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 232/2230 [42:18<6:11:05, 11.14s/it]g-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 232/2230 [42:18<6:11:05, 11.14s/it]g-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:41:54,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:41:54,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.51, 'learning_rate': 0.0001386, 'epoch': 0.52} [WARNING|modeling_bart.py:1051] 2022-03-23 17:41:54,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:41:54,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:41:54,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:41:54,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:41:54,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:42:06,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:42:06,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:42:06,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:42:06,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:42:06,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:42:06,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:42:16,788 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:42:16,788 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:42:16,788 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:42:22,979 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:42:22,979 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:42:22,979 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5932, 'learning_rate': 0.0001404, 'epoch': 0.53} [WARNING|modeling_utils.py:388] 2022-03-23 17:42:22,979 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:42:22,979 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:42:33,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:42:33,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:42:33,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4827, 'learning_rate': 0.00014099999999999998, 'epoch': 0.53} [WARNING|modeling_utils.py:388] 2022-03-23 17:42:39,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:42:41,403 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:42:41,403 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:35:54,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 239/2230 [43:29<5:31:12, 9.98s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▎ | 239/2230 [43:29<5:31:12, 9.98s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4535, 'learning_rate': 0.00014159999999999997, 'epoch': 0.54} [WARNING|modeling_utils.py:388] 2022-03-23 17:42:49,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:42:51,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:42:53,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:42:53,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.532, 'learning_rate': 0.0001422, 'epoch': 0.54} [WARNING|modeling_bart.py:1051] 2022-03-23 17:42:57,653 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:42:59,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:01,699 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:03,770 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:03,770 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:05,688 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:07,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:09,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:09,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:11,364 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:13,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:14,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:14,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:16,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:20,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:21,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:21,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:23,241 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:26,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:27,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:27,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:29,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:31,962 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:34,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:34,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:35,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:38,056 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:38,056 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:40,316 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:42,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:42,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:44,232 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:46,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:46,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:48,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:48,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:49,741 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:49,741 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:53,388 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:53,388 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:43:56,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:00,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:00,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:00,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:03,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:03,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:07,463 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:10,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:10,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:14,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:14,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:14,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:17,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:17,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:21,225 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:24,632 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:24,632 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:28,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:31,377 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:31,377 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.2117, 'learning_rate': 0.00015, 'epoch': 0.57} [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:34,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:34,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:38,169 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:38,169 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:41,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:44,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:44,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8667, 'learning_rate': 0.00015059999999999997, 'epoch': 0.57} [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:48,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7689, 'learning_rate': 0.0001512, 'epoch': 0.57} [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7223, 'learning_rate': 0.00015179999999999998, 'epoch': 0.57} [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6127, 'learning_rate': 0.0001524, 'epoch': 0.58} [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5517, 'learning_rate': 0.00015299999999999998, 'epoch': 0.58} [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6104, 'learning_rate': 0.0001536, 'epoch': 0.58} [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5236, 'learning_rate': 0.00015419999999999998, 'epoch': 0.58} [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6442, 'learning_rate': 0.0001548, 'epoch': 0.59} [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5383, 'learning_rate': 0.00015539999999999998, 'epoch': 0.59} [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.698, 'learning_rate': 0.000156, 'epoch': 0.59} [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5208, 'learning_rate': 0.00015659999999999998, 'epoch': 0.59} [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5523, 'learning_rate': 0.0001572, 'epoch': 0.59} [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5186, 'learning_rate': 0.0001578, 'epoch': 0.6} [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6301, 'learning_rate': 0.0001584, 'epoch': 0.6} [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:44:51,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▎ | 268/2230 [48:28<6:47:05, 12.45s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▎ | 268/2230 [48:28<6:47:05, 12.45s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4806, 'learning_rate': 0.000159, 'epoch': 0.6} 12%|█████████▎ | 268/2230 [48:28<6:47:05, 12.45s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▎ | 268/2230 [48:28<6:47:05, 12.45s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▎ | 268/2230 [48:28<6:47:05, 12.45s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▎ | 268/2230 [48:28<6:47:05, 12.45s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▍ | 269/2230 [48:40<6:44:20, 12.37s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▍ | 269/2230 [48:40<6:44:20, 12.37s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5985, 'learning_rate': 0.0001596, 'epoch': 0.6} 12%|█████████▍ | 269/2230 [48:40<6:44:20, 12.37s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▍ | 269/2230 [48:40<6:44:20, 12.37s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▍ | 269/2230 [48:40<6:44:20, 12.37s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▍ | 269/2230 [48:40<6:44:20, 12.37s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▍ | 269/2230 [48:40<6:44:20, 12.37s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▍ | 269/2230 [48:40<6:44:20, 12.37s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5172, 'learning_rate': 0.0001602, 'epoch': 0.61} 12%|█████████▍ | 269/2230 [48:40<6:44:20, 12.37s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▍ | 269/2230 [48:40<6:44:20, 12.37s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▍ | 269/2230 [48:40<6:44:20, 12.37s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▍ | 269/2230 [48:40<6:44:20, 12.37s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▍ | 269/2230 [48:40<6:44:20, 12.37s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▍ | 269/2230 [48:40<6:44:20, 12.37s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4582, 'learning_rate': 0.0001608, 'epoch': 0.61} 12%|█████████▍ | 269/2230 [48:40<6:44:20, 12.37s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▍ | 269/2230 [48:40<6:44:20, 12.37s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▍ | 269/2230 [48:40<6:44:20, 12.37s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▍ | 269/2230 [48:40<6:44:20, 12.37s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▍ | 269/2230 [48:40<6:44:20, 12.37s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▍ | 269/2230 [48:40<6:44:20, 12.37s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5442, 'learning_rate': 0.0001614, 'epoch': 0.61} 12%|█████████▍ | 269/2230 [48:40<6:44:20, 12.37s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▍ | 269/2230 [48:40<6:44:20, 12.37s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▍ | 269/2230 [48:40<6:44:20, 12.37s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▌ | 273/2230 [49:28<6:32:48, 12.04s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▌ | 273/2230 [49:28<6:32:48, 12.04s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3896, 'learning_rate': 0.000162, 'epoch': 0.61} 12%|█████████▌ | 273/2230 [49:28<6:32:48, 12.04s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▌ | 273/2230 [49:28<6:32:48, 12.04s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▌ | 273/2230 [49:28<6:32:48, 12.04s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▌ | 273/2230 [49:28<6:32:48, 12.04s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▌ | 274/2230 [49:40<6:29:48, 11.96s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▌ | 274/2230 [49:40<6:29:48, 11.96s/it] Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4235, 'learning_rate': 0.0001626, 'epoch': 0.61} [WARNING|modeling_utils.py:388] 2022-03-23 17:49:00,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:49:00,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:49:00,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:49:00,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:49:00,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5386, 'learning_rate': 0.0001632, 'epoch': 0.62} [WARNING|modeling_utils.py:388] 2022-03-23 17:49:00,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:49:00,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:49:00,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:49:00,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:49:00,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:49:00,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4787, 'learning_rate': 0.0001638, 'epoch': 0.62} [WARNING|modeling_utils.py:388] 2022-03-23 17:49:00,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:49:00,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:49:00,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:49:00,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 277/2230 [50:15<6:22:41, 11.76s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 277/2230 [50:15<6:22:41, 11.76s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4511, 'learning_rate': 0.0001644, 'epoch': 0.62} 12%|█████████▋ | 277/2230 [50:15<6:22:41, 11.76s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 277/2230 [50:15<6:22:41, 11.76s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 277/2230 [50:15<6:22:41, 11.76s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 277/2230 [50:15<6:22:41, 11.76s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 277/2230 [50:15<6:22:41, 11.76s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5093, 'learning_rate': 0.000165, 'epoch': 0.62} 12%|█████████▋ | 277/2230 [50:15<6:22:41, 11.76s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 277/2230 [50:15<6:22:41, 11.76s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 277/2230 [50:15<6:22:41, 11.76s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 277/2230 [50:15<6:22:41, 11.76s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 277/2230 [50:15<6:22:41, 11.76s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 277/2230 [50:15<6:22:41, 11.76s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5233, 'learning_rate': 0.0001656, 'epoch': 0.63} 12%|█████████▋ | 277/2230 [50:15<6:22:41, 11.76s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 277/2230 [50:15<6:22:41, 11.76s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 277/2230 [50:15<6:22:41, 11.76s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 277/2230 [50:15<6:22:41, 11.76s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 277/2230 [50:15<6:22:41, 11.76s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5999, 'learning_rate': 0.0001662, 'epoch': 0.63} 12%|█████████▋ | 277/2230 [50:15<6:22:41, 11.76s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 277/2230 [50:15<6:22:41, 11.76s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 277/2230 [50:15<6:22:41, 11.76s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 277/2230 [50:15<6:22:41, 11.76s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 281/2230 [51:00<6:09:20, 11.37s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 281/2230 [51:00<6:09:20, 11.37s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5561, 'learning_rate': 0.0001668, 'epoch': 0.63} 13%|█████████▊ | 281/2230 [51:00<6:09:20, 11.37s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 281/2230 [51:00<6:09:20, 11.37s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 281/2230 [51:00<6:09:20, 11.37s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 281/2230 [51:00<6:09:20, 11.37s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 281/2230 [51:00<6:09:20, 11.37s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3713, 'learning_rate': 0.0001674, 'epoch': 0.63} 13%|█████████▊ | 281/2230 [51:00<6:09:20, 11.37s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 281/2230 [51:00<6:09:20, 11.37s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 281/2230 [51:00<6:09:20, 11.37s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 281/2230 [51:00<6:09:20, 11.37s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 281/2230 [51:00<6:09:20, 11.37s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 281/2230 [51:00<6:09:20, 11.37s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3426, 'learning_rate': 0.000168, 'epoch': 0.63} 13%|█████████▊ | 281/2230 [51:00<6:09:20, 11.37s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:50:44,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:50:44,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 284/2230 [51:33<5:54:23, 10.93s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 284/2230 [51:33<5:54:23, 10.93s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5017, 'learning_rate': 0.0001686, 'epoch': 0.64} 13%|█████████▉ | 284/2230 [51:33<5:54:23, 10.93s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:50:54,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:50:54,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 285/2230 [51:43<5:48:35, 10.75s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 285/2230 [51:43<5:48:35, 10.75s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5855, 'learning_rate': 0.00016919999999999997, 'epoch': 0.64} 13%|█████████▉ | 285/2230 [51:43<5:48:35, 10.75s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:51:05,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:51:05,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|██████████ | 286/2230 [51:53<5:42:53, 10.58s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|██████████ | 286/2230 [51:53<5:42:53, 10.58s/it]g-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:51:11,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:51:11,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:51:11,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:51:17,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:51:17,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:51:17,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5955, 'learning_rate': 0.00017039999999999997, 'epoch': 0.64} [WARNING|modeling_utils.py:388] 2022-03-23 17:51:17,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:51:17,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:51:27,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:51:27,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:51:27,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4935, 'learning_rate': 0.00017099999999999998, 'epoch': 0.65} [WARNING|modeling_utils.py:388] 2022-03-23 17:51:33,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:51:36,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:51:36,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:51:36,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:51:40,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:51:40,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:51:44,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:51:44,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:42:45,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|██████████▏ | 290/2230 [52:31<5:13:17, 9.69s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:51:48,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|██████████▏ | 290/2230 [52:31<5:13:17, 9.69s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:51:48,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:51:50,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:51:48,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:51:52,250 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:51:48,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:51:54,277 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:51:48,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|██████████▏ | 291/2230 [52:40<5:00:06, 9.29s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:51:56,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|██████████▏ | 291/2230 [52:40<5:00:06, 9.29s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:51:56,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:51:58,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:51:56,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:00,232 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:51:56,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:02,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:51:56,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:02,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:51:56,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|██████████▏ | 292/2230 [52:47<4:44:25, 8.81s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:52:04,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:05,812 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:04,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:07,629 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:04,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|██████████▏ | 293/2230 [52:55<4:28:44, 8.32s/it] Setting `use_cache=False`...1] 2022-03-23 17:52:04,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|██████████▏ | 293/2230 [52:55<4:28:44, 8.32s/it] Setting `use_cache=False`...1] 2022-03-23 17:52:04,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:12,802 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:11,170 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:14,382 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:11,170 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:15,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:11,170 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:15,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:11,170 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|██████████▎ | 294/2230 [53:01<4:10:31, 7.76s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:52:17,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:20,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:17,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:20,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:17,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|██████████▎ | 295/2230 [53:07<3:50:48, 7.16s/it] Setting `use_cache=False`...1] 2022-03-23 17:52:17,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:24,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:23,242 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:27,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:23,242 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:27,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:23,242 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|██████████▎ | 296/2230 [53:12<3:30:58, 6.55s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:52:28,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:30,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:28,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:30,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:28,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|██████████▍ | 297/2230 [53:17<3:11:47, 5.95s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:52:32,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:34,870 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:32,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:34,870 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:32,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|██████████▍ | 298/2230 [53:21<2:53:38, 5.39s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:52:36,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:38,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:36,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:38,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:36,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:41,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:40,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:42,582 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:40,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:42,582 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:40,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|██████████▍ | 300/2230 [53:28<2:21:55, 4.41s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:52:44,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|██████████▍ | 300/2230 [53:28<2:21:55, 4.41s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:52:44,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:48,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:44,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:48,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:44,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:52,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:44,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:55,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:44,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:55,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:44,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:52:55,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:44,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|██████████▌ | 301/2230 [53:42<3:58:45, 7.43s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:52:59,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|██████████▌ | 301/2230 [53:42<3:58:45, 7.43s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:52:59,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:02,606 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:59,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:02,606 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:59,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:06,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:59,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:09,388 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:52:59,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 302/2230 [53:56<4:59:42, 9.33s/it] Setting `use_cache=False`...1] 2022-03-23 17:52:59,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 302/2230 [53:56<4:59:42, 9.33s/it] Setting `use_cache=False`...1] 2022-03-23 17:52:59,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 302/2230 [53:56<4:59:42, 9.33s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:53:12,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 302/2230 [53:56<4:59:42, 9.33s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:53:12,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:16,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:12,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:19,715 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:12,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:19,715 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:12,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:23,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:12,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:23,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:12,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 303/2230 [54:09<5:41:47, 10.64s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:12,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 303/2230 [54:09<5:41:47, 10.64s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:53:26,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:29,856 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:26,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:29,856 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:26,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:33,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:26,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:33,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:26,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:36,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:26,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:36,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:26,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▋ | 304/2230 [54:23<6:07:53, 11.46s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▋ | 304/2230 [54:23<6:07:53, 11.46s/it][WARNING|modeling_bart.py:1051] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:43,275 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9463, 'learning_rate': 0.00018119999999999999, 'epoch': 0.68} [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7063, 'learning_rate': 0.00018179999999999997, 'epoch': 0.69} [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6942, 'learning_rate': 0.0001824, 'epoch': 0.69} [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6681, 'learning_rate': 0.00018299999999999998, 'epoch': 0.69} [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6439, 'learning_rate': 0.0001836, 'epoch': 0.69} [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7031, 'learning_rate': 0.00018419999999999998, 'epoch': 0.7} [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4416, 'learning_rate': 0.0001848, 'epoch': 0.7} [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5832, 'learning_rate': 0.00018539999999999998, 'epoch': 0.7} [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 17:53:46,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5866, 'learning_rate': 0.000186, 'epoch': 0.7} 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4809, 'learning_rate': 0.00018659999999999998, 'epoch': 0.7} 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5242, 'learning_rate': 0.0001872, 'epoch': 0.71} 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4804, 'learning_rate': 0.00018779999999999998, 'epoch': 0.71} 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4799, 'learning_rate': 0.00018839999999999997, 'epoch': 0.71} 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 313/2230 [56:20<6:49:38, 12.82s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████ | 318/2230 [57:21<6:34:26, 12.38s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████ | 318/2230 [57:21<6:34:26, 12.38s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4712, 'learning_rate': 0.00018899999999999999, 'epoch': 0.71} 14%|███████████ | 318/2230 [57:21<6:34:26, 12.38s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████ | 318/2230 [57:21<6:34:26, 12.38s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████ | 318/2230 [57:21<6:34:26, 12.38s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████ | 318/2230 [57:21<6:34:26, 12.38s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████ | 318/2230 [57:21<6:34:26, 12.38s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████ | 318/2230 [57:21<6:34:26, 12.38s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4968, 'learning_rate': 0.00018959999999999997, 'epoch': 0.72} 14%|███████████ | 318/2230 [57:21<6:34:26, 12.38s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████ | 318/2230 [57:21<6:34:26, 12.38s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████ | 318/2230 [57:21<6:34:26, 12.38s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████ | 318/2230 [57:21<6:34:26, 12.38s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 320/2230 [57:46<6:30:12, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 320/2230 [57:46<6:30:12, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5755, 'learning_rate': 0.0001902, 'epoch': 0.72} 14%|███████████▏ | 320/2230 [57:46<6:30:12, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 320/2230 [57:46<6:30:12, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 320/2230 [57:46<6:30:12, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 320/2230 [57:46<6:30:12, 12.26s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4656, 'learning_rate': 0.00019079999999999998, 'epoch': 0.72} 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3633, 'learning_rate': 0.0001914, 'epoch': 0.72} 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.458, 'learning_rate': 0.00019199999999999998, 'epoch': 0.72} 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3913, 'learning_rate': 0.0001926, 'epoch': 0.73} 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5151, 'learning_rate': 0.00019319999999999998, 'epoch': 0.73} 14%|███████████▏ | 321/2230 [57:58<6:27:58, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:58:07,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:58:07,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:58:07,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▍ | 326/2230 [58:57<6:18:21, 11.92s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▍ | 326/2230 [58:57<6:18:21, 11.92s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4925, 'learning_rate': 0.0001938, 'epoch': 0.73} 15%|███████████▍ | 326/2230 [58:57<6:18:21, 11.92s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▍ | 326/2230 [58:57<6:18:21, 11.92s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▍ | 326/2230 [58:57<6:18:21, 11.92s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▍ | 326/2230 [58:57<6:18:21, 11.92s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▍ | 326/2230 [58:57<6:18:21, 11.92s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4414, 'learning_rate': 0.00019439999999999998, 'epoch': 0.73} 15%|███████████▍ | 326/2230 [58:57<6:18:21, 11.92s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▍ | 326/2230 [58:57<6:18:21, 11.92s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▍ | 326/2230 [58:57<6:18:21, 11.92s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▍ | 326/2230 [58:57<6:18:21, 11.92s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▍ | 328/2230 [59:20<6:10:40, 11.69s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▍ | 328/2230 [59:20<6:10:40, 11.69s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3735, 'learning_rate': 0.000195, 'epoch': 0.74} 15%|███████████▍ | 328/2230 [59:20<6:10:40, 11.69s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▍ | 328/2230 [59:20<6:10:40, 11.69s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▍ | 328/2230 [59:20<6:10:40, 11.69s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▍ | 328/2230 [59:20<6:10:40, 11.69s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▍ | 328/2230 [59:20<6:10:40, 11.69s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4079, 'learning_rate': 0.00019559999999999998, 'epoch': 0.74} [WARNING|modeling_utils.py:388] 2022-03-23 17:58:50,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:58:50,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:58:54,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:58:54,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 330/2230 [59:43<6:03:27, 11.48s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 330/2230 [59:43<6:03:27, 11.48s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5079, 'learning_rate': 0.0001962, 'epoch': 0.74} 15%|███████████▌ | 330/2230 [59:43<6:03:27, 11.48s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 330/2230 [59:43<6:03:27, 11.48s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 330/2230 [59:43<6:03:27, 11.48s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 330/2230 [59:43<6:03:27, 11.48s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 330/2230 [59:43<6:03:27, 11.48s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.515, 'learning_rate': 0.00019679999999999999, 'epoch': 0.74} 15%|███████████▌ | 330/2230 [59:43<6:03:27, 11.48s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 330/2230 [59:43<6:03:27, 11.48s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 330/2230 [59:43<6:03:27, 11.48s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 330/2230 [59:43<6:03:27, 11.48s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▎ | 332/2230 [1:00:05<5:56:34, 11.27s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▎ | 332/2230 [1:00:05<5:56:34, 11.27s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3635, 'learning_rate': 0.0001974, 'epoch': 0.74} 15%|███████████▎ | 332/2230 [1:00:05<5:56:34, 11.27s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▎ | 332/2230 [1:00:05<5:56:34, 11.27s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▎ | 332/2230 [1:00:05<5:56:34, 11.27s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▎ | 332/2230 [1:00:05<5:56:34, 11.27s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▎ | 332/2230 [1:00:05<5:56:34, 11.27s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5648, 'learning_rate': 0.000198, 'epoch': 0.75} 15%|███████████▎ | 332/2230 [1:00:05<5:56:34, 11.27s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▎ | 332/2230 [1:00:05<5:56:34, 11.27s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▎ | 332/2230 [1:00:05<5:56:34, 11.27s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▎ | 332/2230 [1:00:05<5:56:34, 11.27s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▎ | 332/2230 [1:00:05<5:56:34, 11.27s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4682, 'learning_rate': 0.0001986, 'epoch': 0.75} [WARNING|modeling_utils.py:388] 2022-03-23 17:59:45,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:59:45,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:59:45,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:59:45,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▍ | 335/2230 [1:00:37<5:43:08, 10.86s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▍ | 335/2230 [1:00:37<5:43:08, 10.86s/it]g-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:59:55,694 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:59:55,694 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:59:55,694 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 17:59:55,694 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 17:53:39,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▍ | 336/2230 [1:00:47<5:37:14, 10.68s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:00:04,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▍ | 336/2230 [1:00:47<5:37:14, 10.68s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:00:04,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4503, 'learning_rate': 0.0001998, 'epoch': 0.75} 15%|███████████▍ | 336/2230 [1:00:47<5:37:14, 10.68s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:00:04,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▍ | 336/2230 [1:00:47<5:37:14, 10.68s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:00:04,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:00:12,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:04,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:00:12,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:04,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:00:12,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:04,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4666, 'learning_rate': 0.0002004, 'epoch': 0.76} [WARNING|modeling_utils.py:388] 2022-03-23 18:00:12,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:04,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:00:12,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:04,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:00:12,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:04,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 338/2230 [1:01:08<5:30:03, 10.47s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 338/2230 [1:01:08<5:30:03, 10.47s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4885, 'learning_rate': 0.000201, 'epoch': 0.76} 15%|███████████▌ | 338/2230 [1:01:08<5:30:03, 10.47s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:00:30,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:00:33,006 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:00:33,006 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:00:33,006 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:00:36,647 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:00:36,647 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:00:40,490 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:00:42,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:00:42,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6036, 'learning_rate': 0.0002022, 'epoch': 0.76} [WARNING|modeling_bart.py:1051] 2022-03-23 18:00:46,900 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:00:49,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:00:51,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:00:51,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:00:51,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:00:54,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:00:56,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:00:58,842 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:00,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:00,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:02,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:04,579 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:06,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:06,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:08,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:09,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:13,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:14,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:14,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:16,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:19,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:21,042 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:21,042 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:22,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:25,102 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:26,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:26,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:28,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:31,339 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:31,339 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:33,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:35,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:35,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:37,265 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:39,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:39,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:41,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:41,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:43,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:43,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:47,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:47,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:50,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:50,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:50,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:54,302 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:57,874 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:01:57,874 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:01,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:04,693 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:04,693 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:08,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:08,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.8259, 'learning_rate': 0.00020939999999999997, 'epoch': 0.79} [WARNING|modeling_utils.py:388] 2022-03-23 18:02:11,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:15,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:15,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:18,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:18,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:21,794 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:21,794 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:25,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:25,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:28,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:28,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:31,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:35,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:35,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:35,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9793, 'learning_rate': 0.00021119999999999996, 'epoch': 0.8} [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9003, 'learning_rate': 0.00021179999999999997, 'epoch': 0.8} [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7375, 'learning_rate': 0.00021239999999999996, 'epoch': 0.8} [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7576, 'learning_rate': 0.00021299999999999997, 'epoch': 0.8} [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:02:38,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6035, 'learning_rate': 0.00021359999999999996, 'epoch': 0.8} 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5928, 'learning_rate': 0.00021419999999999998, 'epoch': 0.81} 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5001, 'learning_rate': 0.00021479999999999996, 'epoch': 0.81} 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5366, 'learning_rate': 0.00021539999999999998, 'epoch': 0.81} 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▏ | 359/2230 [1:04:27<6:39:23, 12.81s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▎ | 363/2230 [1:05:18<6:39:42, 12.85s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▎ | 363/2230 [1:05:18<6:39:42, 12.85s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▎ | 363/2230 [1:05:18<6:39:42, 12.85s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▎ | 363/2230 [1:05:18<6:39:42, 12.85s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▎ | 363/2230 [1:05:18<6:39:42, 12.85s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▎ | 363/2230 [1:05:18<6:39:42, 12.85s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 364/2230 [1:05:31<6:37:23, 12.78s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 364/2230 [1:05:31<6:37:23, 12.78s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5041, 'learning_rate': 0.00021659999999999998, 'epoch': 0.82} 16%|████████████▍ | 364/2230 [1:05:31<6:37:23, 12.78s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 364/2230 [1:05:31<6:37:23, 12.78s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 364/2230 [1:05:31<6:37:23, 12.78s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 364/2230 [1:05:31<6:37:23, 12.78s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6164, 'learning_rate': 0.00021719999999999997, 'epoch': 0.82} 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4303, 'learning_rate': 0.00021779999999999998, 'epoch': 0.82} 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5242, 'learning_rate': 0.00021839999999999997, 'epoch': 0.82} 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5089, 'learning_rate': 0.00021899999999999998, 'epoch': 0.83} 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4234, 'learning_rate': 0.00021959999999999997, 'epoch': 0.83} 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 365/2230 [1:05:43<6:34:25, 12.69s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4205, 'learning_rate': 0.00022019999999999999, 'epoch': 0.83} g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3864, 'learning_rate': 0.00022079999999999997, 'epoch': 0.83} g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4607, 'learning_rate': 0.0002214, 'epoch': 0.83} 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4477, 'learning_rate': 0.00022199999999999998, 'epoch': 0.84} 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2569, 'learning_rate': 0.0002226, 'epoch': 0.84} 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4793, 'learning_rate': 0.00022319999999999998, 'epoch': 0.84} 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3588, 'learning_rate': 0.0002238, 'epoch': 0.84} 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▋ | 372/2230 [1:07:09<6:19:09, 12.24s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▊ | 377/2230 [1:08:08<6:07:33, 11.90s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▊ | 377/2230 [1:08:08<6:07:33, 11.90s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4029, 'learning_rate': 0.00022439999999999998, 'epoch': 0.85} 17%|████████████▊ | 377/2230 [1:08:08<6:07:33, 11.90s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▊ | 377/2230 [1:08:08<6:07:33, 11.90s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▊ | 377/2230 [1:08:08<6:07:33, 11.90s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▊ | 377/2230 [1:08:08<6:07:33, 11.90s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▊ | 377/2230 [1:08:08<6:07:33, 11.90s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3932, 'learning_rate': 0.000225, 'epoch': 0.85} 17%|████████████▊ | 377/2230 [1:08:08<6:07:33, 11.90s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▊ | 377/2230 [1:08:08<6:07:33, 11.90s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▊ | 377/2230 [1:08:08<6:07:33, 11.90s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▊ | 377/2230 [1:08:08<6:07:33, 11.90s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▉ | 379/2230 [1:08:31<5:58:10, 11.61s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|████████████▉ | 379/2230 [1:08:31<5:58:10, 11.61s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2944, 'learning_rate': 0.00022559999999999998, 'epoch': 0.85} 17%|████████████▉ | 379/2230 [1:08:31<5:58:10, 11.61s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:07:54,136 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:07:54,136 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:07:54,136 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:07:54,136 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:07:54,136 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3351, 'learning_rate': 0.00022619999999999997, 'epoch': 0.85} [WARNING|modeling_utils.py:388] 2022-03-23 18:07:54,136 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:07:54,136 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:07:54,136 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:07:54,136 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4359, 'learning_rate': 0.00022679999999999998, 'epoch': 0.85} [WARNING|modeling_utils.py:388] 2022-03-23 18:07:54,136 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:07:54,136 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:07:54,136 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:07:54,136 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:07:54,136 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 382/2230 [1:09:05<5:47:17, 11.28s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 382/2230 [1:09:05<5:47:17, 11.28s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 382/2230 [1:09:05<5:47:17, 11.28s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 382/2230 [1:09:05<5:47:17, 11.28s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 382/2230 [1:09:05<5:47:17, 11.28s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:08:31,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:08:31,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3975, 'learning_rate': 0.00022799999999999999, 'epoch': 0.86} [WARNING|modeling_utils.py:388] 2022-03-23 18:08:35,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:08:35,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:08:35,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:08:35,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:08:35,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.308, 'learning_rate': 0.00022859999999999997, 'epoch': 0.86} [WARNING|modeling_utils.py:388] 2022-03-23 18:08:35,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:08:35,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:08:35,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:08:35,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:08:35,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:08:53,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:08:53,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:08:53,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:08:53,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:09:01,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:09:01,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2904, 'learning_rate': 0.00022979999999999997, 'epoch': 0.87} [WARNING|modeling_bart.py:1051] 2022-03-23 18:09:01,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:09:01,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:09:09,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:09:12,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:09:12,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4339, 'learning_rate': 0.0002304, 'epoch': 0.87} [WARNING|modeling_utils.py:388] 2022-03-23 18:09:12,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:09:12,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:09:12,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:09:22,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:09:22,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4791, 'learning_rate': 0.00023099999999999998, 'epoch': 0.87} [WARNING|modeling_utils.py:388] 2022-03-23 18:09:25,934 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:09:25,934 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:09:30,106 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████▎ | 389/2230 [1:10:16<5:07:04, 10.01s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████▎ | 389/2230 [1:10:16<5:07:04, 10.01s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:09:34,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:09:36,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:09:36,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:09:40,296 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:09:40,296 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3603, 'learning_rate': 0.00023219999999999998, 'epoch': 0.87} [WARNING|modeling_utils.py:388] 2022-03-23 18:09:44,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:09:46,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:09:48,352 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:09:48,352 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:09:50,534 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:09:52,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:09:54,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:09:56,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:09:56,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:09:58,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:00,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:02,542 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:04,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:04,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:06,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:08,150 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:11,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:13,339 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:13,339 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:14,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:16,530 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:19,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:19,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:21,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:23,813 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:25,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:25,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:27,629 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:30,028 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:30,028 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:32,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:34,203 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:34,203 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:36,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:37,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:37,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:40,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:40,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:40,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:42,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:42,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:46,494 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:50,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:50,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:53,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:53,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:53,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:57,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:10:57,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:00,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:03,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:03,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:07,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:07,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:07,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:10,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:14,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:14,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:17,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:17,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:17,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:20,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:24,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:24,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:27,725 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:31,055 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:31,055 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:34,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:34,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.0418, 'learning_rate': 0.0002406, 'epoch': 0.91} [WARNING|modeling_utils.py:388] 2022-03-23 18:11:34,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:34,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:34,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:34,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:11:34,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8755, 'learning_rate': 0.00024119999999999998, 'epoch': 0.91} 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7933, 'learning_rate': 0.0002418, 'epoch': 0.91} 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7055, 'learning_rate': 0.00024239999999999998, 'epoch': 0.91} 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6617, 'learning_rate': 0.000243, 'epoch': 0.91} 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5539, 'learning_rate': 0.00024359999999999999, 'epoch': 0.92} 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.492, 'learning_rate': 0.00024419999999999997, 'epoch': 0.92} 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6706, 'learning_rate': 0.0002448, 'epoch': 0.92} 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 405/2230 [1:12:33<6:06:04, 12.04s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4853, 'learning_rate': 0.00024539999999999995, 'epoch': 0.92} g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 413/2230 [1:14:17<6:27:05, 12.78s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 413/2230 [1:14:17<6:27:05, 12.78s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4369, 'learning_rate': 0.00024599999999999996, 'epoch': 0.93} 19%|██████████████ | 413/2230 [1:14:17<6:27:05, 12.78s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 413/2230 [1:14:17<6:27:05, 12.78s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 413/2230 [1:14:17<6:27:05, 12.78s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 413/2230 [1:14:17<6:27:05, 12.78s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4848, 'learning_rate': 0.0002466, 'epoch': 0.93} 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5455, 'learning_rate': 0.0002472, 'epoch': 0.93} 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4175, 'learning_rate': 0.00024779999999999995, 'epoch': 0.93} 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4364, 'learning_rate': 0.00024839999999999997, 'epoch': 0.93} 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 414/2230 [1:14:29<6:23:39, 12.68s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:14:34,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:14:34,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3047, 'learning_rate': 0.000249, 'epoch': 0.94} [WARNING|modeling_utils.py:388] 2022-03-23 18:14:34,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:14:34,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:14:34,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:14:34,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:14:34,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:14:34,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4146, 'learning_rate': 0.00024959999999999994, 'epoch': 0.94} [WARNING|modeling_utils.py:388] 2022-03-23 18:14:34,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:14:34,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:14:34,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:14:34,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 420/2230 [1:15:42<6:06:17, 12.14s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 420/2230 [1:15:42<6:06:17, 12.14s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3201, 'learning_rate': 0.00025019999999999996, 'epoch': 0.94} 19%|██████████████▎ | 420/2230 [1:15:42<6:06:17, 12.14s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 420/2230 [1:15:42<6:06:17, 12.14s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 420/2230 [1:15:42<6:06:17, 12.14s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 420/2230 [1:15:42<6:06:17, 12.14s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 420/2230 [1:15:42<6:06:17, 12.14s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 420/2230 [1:15:42<6:06:17, 12.14s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4562, 'learning_rate': 0.00025079999999999997, 'epoch': 0.94} 19%|██████████████▎ | 420/2230 [1:15:42<6:06:17, 12.14s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 420/2230 [1:15:42<6:06:17, 12.14s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 420/2230 [1:15:42<6:06:17, 12.14s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 420/2230 [1:15:42<6:06:17, 12.14s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 420/2230 [1:15:42<6:06:17, 12.14s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3312, 'learning_rate': 0.0002514, 'epoch': 0.95} 19%|██████████████▎ | 420/2230 [1:15:42<6:06:17, 12.14s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 420/2230 [1:15:42<6:06:17, 12.14s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 420/2230 [1:15:42<6:06:17, 12.14s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 420/2230 [1:15:42<6:06:17, 12.14s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 420/2230 [1:15:42<6:06:17, 12.14s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▎ | 420/2230 [1:15:42<6:06:17, 12.14s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:15:35,647 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:15:35,647 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:15:35,647 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:15:35,647 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:15:35,647 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 424/2230 [1:16:29<5:54:19, 11.77s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 424/2230 [1:16:29<5:54:19, 11.77s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4837, 'learning_rate': 0.00025259999999999996, 'epoch': 0.95} 19%|██████████████▍ | 424/2230 [1:16:29<5:54:19, 11.77s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 424/2230 [1:16:29<5:54:19, 11.77s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 424/2230 [1:16:29<5:54:19, 11.77s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 424/2230 [1:16:29<5:54:19, 11.77s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 424/2230 [1:16:29<5:54:19, 11.77s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 424/2230 [1:16:29<5:54:19, 11.77s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3916, 'learning_rate': 0.0002532, 'epoch': 0.95} 19%|██████████████▍ | 424/2230 [1:16:29<5:54:19, 11.77s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 424/2230 [1:16:29<5:54:19, 11.77s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 424/2230 [1:16:29<5:54:19, 11.77s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 424/2230 [1:16:29<5:54:19, 11.77s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 424/2230 [1:16:29<5:54:19, 11.77s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.405, 'learning_rate': 0.0002538, 'epoch': 0.96} 19%|██████████████▍ | 424/2230 [1:16:29<5:54:19, 11.77s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 424/2230 [1:16:29<5:54:19, 11.77s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 424/2230 [1:16:29<5:54:19, 11.77s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 424/2230 [1:16:29<5:54:19, 11.77s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 424/2230 [1:16:29<5:54:19, 11.77s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 424/2230 [1:16:29<5:54:19, 11.77s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3168, 'learning_rate': 0.00025439999999999995, 'epoch': 0.96} 19%|██████████████▍ | 424/2230 [1:16:29<5:54:19, 11.77s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 424/2230 [1:16:29<5:54:19, 11.77s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 424/2230 [1:16:29<5:54:19, 11.77s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 428/2230 [1:17:14<5:41:52, 11.38s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 428/2230 [1:17:14<5:41:52, 11.38s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3303, 'learning_rate': 0.00025499999999999996, 'epoch': 0.96} 19%|██████████████▌ | 428/2230 [1:17:14<5:41:52, 11.38s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:16:36,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:16:36,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:16:36,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:16:36,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:16:36,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2059, 'learning_rate': 0.0002556, 'epoch': 0.96} [WARNING|modeling_bart.py:1051] 2022-03-23 18:16:36,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:16:36,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:16:50,647 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:16:50,647 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:16:50,647 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:16:54,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:16:54,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:16:58,729 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:16:58,729 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▋ | 431/2230 [1:17:47<5:29:17, 10.98s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▋ | 431/2230 [1:17:47<5:29:17, 10.98s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3357, 'learning_rate': 0.00025679999999999995, 'epoch': 0.97} 19%|██████████████▋ | 431/2230 [1:17:47<5:29:17, 10.98s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:17:09,141 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:17:09,141 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▋ | 432/2230 [1:17:57<5:23:38, 10.80s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▋ | 432/2230 [1:17:57<5:23:38, 10.80s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5314, 'learning_rate': 0.00025739999999999997, 'epoch': 0.97} 19%|██████████████▋ | 432/2230 [1:17:57<5:23:38, 10.80s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▋ | 432/2230 [1:17:57<5:23:38, 10.80s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▋ | 432/2230 [1:17:57<5:23:38, 10.80s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▋ | 432/2230 [1:17:57<5:23:38, 10.80s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:17:23,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:17:23,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:17:23,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:17:29,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:17:29,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:17:29,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▊ | 434/2230 [1:18:17<5:10:59, 10.39s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:17:35,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:17:35,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:17:35,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:17:41,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:17:41,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3382, 'learning_rate': 0.00025919999999999996, 'epoch': 0.98} [WARNING|modeling_bart.py:1051] 2022-03-23 18:17:45,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:17:45,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:17:49,414 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:17:49,414 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:17:51,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:17:53,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:17:56,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:17:58,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:18:00,180 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:18:00,180 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2964, 'learning_rate': 0.0002604, 'epoch': 0.98} [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:03,928 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:06,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:08,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:08,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:10,182 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:12,023 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:13,809 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:15,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:15,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:17,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:19,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:22,368 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:22,368 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:24,077 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:25,599 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:28,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:28,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:30,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:32,616 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:33,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:33,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:36,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:38,579 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:38,579 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:40,762 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:42,706 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:42,706 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:44,579 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:46,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:46,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:48,636 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:48,636 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:49,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:52,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:52,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:56,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:56,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:59,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:59,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:18:59,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:03,400 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:07,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:07,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:10,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:10,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:13,798 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:17,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:17,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.7358, 'learning_rate': 0.000267, 'epoch': 1.0} [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:20,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:20,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:24,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:27,440 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:27,440 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:30,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:30,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.3335, 'learning_rate': 0.0002676, 'epoch': 1.01} [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:34,323 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:37,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:37,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:41,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:41,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:44,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:44,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9919, 'learning_rate': 0.00026819999999999996, 'epoch': 1.01} [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:44,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:44,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:44,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:44,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:44,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:44,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:44,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7456, 'learning_rate': 0.0002688, 'epoch': 1.01} [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:44,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:44,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:44,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:19:44,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 452/2230 [1:20:56<6:11:30, 12.54s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 452/2230 [1:20:56<6:11:30, 12.54s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6112, 'learning_rate': 0.0002694, 'epoch': 1.01} 20%|███████████████▍ | 452/2230 [1:20:56<6:11:30, 12.54s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 452/2230 [1:20:56<6:11:30, 12.54s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 452/2230 [1:20:56<6:11:30, 12.54s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 452/2230 [1:20:56<6:11:30, 12.54s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 452/2230 [1:20:56<6:11:30, 12.54s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 452/2230 [1:20:56<6:11:30, 12.54s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4133, 'learning_rate': 0.00027, 'epoch': 1.02} 20%|███████████████▍ | 452/2230 [1:20:56<6:11:30, 12.54s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 452/2230 [1:20:56<6:11:30, 12.54s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 452/2230 [1:20:56<6:11:30, 12.54s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 452/2230 [1:20:56<6:11:30, 12.54s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 452/2230 [1:20:56<6:11:30, 12.54s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 452/2230 [1:20:56<6:11:30, 12.54s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2591, 'learning_rate': 0.0002712, 'epoch': 1.02} 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3699, 'learning_rate': 0.0002718, 'epoch': 1.02} 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1498, 'learning_rate': 0.0002724, 'epoch': 1.02} 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1313, 'learning_rate': 0.00027299999999999997, 'epoch': 1.03} 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2658, 'learning_rate': 0.0002736, 'epoch': 1.03} 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1512, 'learning_rate': 0.0002742, 'epoch': 1.03} 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1147, 'learning_rate': 0.0002748, 'epoch': 1.03} 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0338, 'learning_rate': 0.00027539999999999997, 'epoch': 1.04} 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0853, 'learning_rate': 0.000276, 'epoch': 1.04} 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0146, 'learning_rate': 0.0002766, 'epoch': 1.04} 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9486, 'learning_rate': 0.0002772, 'epoch': 1.04} 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 454/2230 [1:21:23<6:20:22, 12.85s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:23:07,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:23:07,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:23:07,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9637, 'learning_rate': 0.0002778, 'epoch': 1.04} [WARNING|modeling_utils.py:388] 2022-03-23 18:23:07,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:23:07,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:23:07,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:23:07,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:23:07,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:23:07,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0797, 'learning_rate': 0.0002784, 'epoch': 1.05} [WARNING|modeling_utils.py:388] 2022-03-23 18:23:07,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:23:07,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:23:07,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:23:07,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:23:07,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:23:07,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9126, 'learning_rate': 0.000279, 'epoch': 1.05} [WARNING|modeling_utils.py:388] 2022-03-23 18:23:07,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:23:07,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:23:07,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:23:07,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:23:46,051 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:23:46,051 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9294, 'learning_rate': 0.00027959999999999997, 'epoch': 1.05} [WARNING|modeling_utils.py:388] 2022-03-23 18:23:46,051 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:23:46,051 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:23:46,051 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:23:46,051 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 470/2230 [1:24:42<5:52:14, 12.01s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 470/2230 [1:24:42<5:52:14, 12.01s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0885, 'learning_rate': 0.0002802, 'epoch': 1.05} 21%|████████████████ | 470/2230 [1:24:42<5:52:14, 12.01s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 470/2230 [1:24:42<5:52:14, 12.01s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 470/2230 [1:24:42<5:52:14, 12.01s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 470/2230 [1:24:42<5:52:14, 12.01s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 470/2230 [1:24:42<5:52:14, 12.01s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 470/2230 [1:24:42<5:52:14, 12.01s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0577, 'learning_rate': 0.0002808, 'epoch': 1.06} 21%|████████████████ | 470/2230 [1:24:42<5:52:14, 12.01s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 470/2230 [1:24:42<5:52:14, 12.01s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 470/2230 [1:24:42<5:52:14, 12.01s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 470/2230 [1:24:42<5:52:14, 12.01s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 470/2230 [1:24:42<5:52:14, 12.01s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9314, 'learning_rate': 0.00028139999999999996, 'epoch': 1.06} 21%|████████████████ | 470/2230 [1:24:42<5:52:14, 12.01s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 470/2230 [1:24:42<5:52:14, 12.01s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 470/2230 [1:24:42<5:52:14, 12.01s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 470/2230 [1:24:42<5:52:14, 12.01s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 473/2230 [1:25:16<5:41:48, 11.67s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 473/2230 [1:25:16<5:41:48, 11.67s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9632, 'learning_rate': 0.00028199999999999997, 'epoch': 1.06} 21%|████████████████ | 473/2230 [1:25:16<5:41:48, 11.67s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 473/2230 [1:25:16<5:41:48, 11.67s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 473/2230 [1:25:16<5:41:48, 11.67s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 473/2230 [1:25:16<5:41:48, 11.67s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 473/2230 [1:25:16<5:41:48, 11.67s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.7662, 'learning_rate': 0.0002826, 'epoch': 1.06} 21%|████████████████ | 473/2230 [1:25:16<5:41:48, 11.67s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 473/2230 [1:25:16<5:41:48, 11.67s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 473/2230 [1:25:16<5:41:48, 11.67s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 473/2230 [1:25:16<5:41:48, 11.67s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 473/2230 [1:25:16<5:41:48, 11.67s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 473/2230 [1:25:16<5:41:48, 11.67s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.7393, 'learning_rate': 0.00028319999999999994, 'epoch': 1.07} 21%|████████████████ | 473/2230 [1:25:16<5:41:48, 11.67s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 473/2230 [1:25:16<5:41:48, 11.67s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 473/2230 [1:25:16<5:41:48, 11.67s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 473/2230 [1:25:16<5:41:48, 11.67s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 473/2230 [1:25:16<5:41:48, 11.67s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 473/2230 [1:25:16<5:41:48, 11.67s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9623, 'learning_rate': 0.00028379999999999996, 'epoch': 1.07} 21%|████████████████ | 473/2230 [1:25:16<5:41:48, 11.67s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 473/2230 [1:25:16<5:41:48, 11.67s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 473/2230 [1:25:16<5:41:48, 11.67s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 473/2230 [1:25:16<5:41:48, 11.67s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▎ | 477/2230 [1:26:02<5:32:08, 11.37s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:25:20,541 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:25:20,541 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:25:20,541 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:25:20,541 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▎ | 478/2230 [1:26:13<5:27:57, 11.23s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▎ | 478/2230 [1:26:13<5:27:57, 11.23s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8946, 'learning_rate': 0.000285, 'epoch': 1.07} 21%|████████████████▎ | 478/2230 [1:26:13<5:27:57, 11.23s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▎ | 478/2230 [1:26:13<5:27:57, 11.23s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▎ | 478/2230 [1:26:13<5:27:57, 11.23s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▎ | 478/2230 [1:26:13<5:27:57, 11.23s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▎ | 478/2230 [1:26:13<5:27:57, 11.23s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9246, 'learning_rate': 0.00028559999999999995, 'epoch': 1.07} 21%|████████████████▎ | 478/2230 [1:26:13<5:27:57, 11.23s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▎ | 478/2230 [1:26:13<5:27:57, 11.23s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▎ | 478/2230 [1:26:13<5:27:57, 11.23s/it]g-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:25:49,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:25:49,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9423, 'learning_rate': 0.00028619999999999996, 'epoch': 1.08} [WARNING|modeling_bart.py:1051] 2022-03-23 18:25:49,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:25:49,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:25:49,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:25:49,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:25:49,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:26:01,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:26:01,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:26:01,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:26:01,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:26:09,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:26:09,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.7827, 'learning_rate': 0.00028739999999999994, 'epoch': 1.08} [WARNING|modeling_bart.py:1051] 2022-03-23 18:26:09,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:26:09,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:26:17,908 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:26:20,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:26:20,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8545, 'learning_rate': 0.00028799999999999995, 'epoch': 1.08} [WARNING|modeling_utils.py:388] 2022-03-23 18:26:24,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:26:24,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:26:28,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 484/2230 [1:27:14<4:54:07, 10.11s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 484/2230 [1:27:14<4:54:07, 10.11s/it] Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8231, 'learning_rate': 0.00028859999999999997, 'epoch': 1.09} [WARNING|modeling_bart.py:1051] 2022-03-23 18:26:34,042 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:26:36,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:26:36,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:26:36,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:26:40,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:26:42,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:26:44,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:26:46,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:26:46,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-23 18:26:46,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:26:50,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:26:52,719 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:26:54,752 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:00:24,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▌ | 487/2230 [1:27:40<4:26:26, 9.17s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:26:56,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▌ | 487/2230 [1:27:40<4:26:26, 9.17s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:26:56,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:26:58,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:26:56,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:01,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:26:56,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:03,023 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:26:56,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:03,023 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:26:56,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▋ | 488/2230 [1:27:48<4:17:47, 8.88s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:27:05,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:06,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:05,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:08,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:05,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:10,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:05,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:10,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:05,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▋ | 489/2230 [1:27:56<4:04:35, 8.43s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:27:12,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:14,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:12,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:17,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:12,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:17,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:12,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▋ | 490/2230 [1:28:02<3:49:35, 7.92s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:27:19,000 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:20,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:19,000 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:23,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:19,000 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:23,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:19,000 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▋ | 491/2230 [1:28:08<3:32:24, 7.33s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:27:24,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:27,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:24,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:27,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:24,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▊ | 492/2230 [1:28:14<3:13:04, 6.67s/it] Setting `use_cache=False`...1] 2022-03-23 18:27:24,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:31,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:29,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:33,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:29,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:33,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:29,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:35,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:34,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:37,277 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:34,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:37,277 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:34,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:39,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:38,284 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:39,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:38,284 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▊ | 495/2230 [1:28:25<2:19:01, 4.81s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:27:41,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:43,853 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:41,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:43,853 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:41,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 496/2230 [1:28:28<2:03:01, 4.26s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:27:45,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 496/2230 [1:28:28<2:03:01, 4.26s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:27:45,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:49,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:45,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:49,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:45,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:52,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:45,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:56,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:45,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:56,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:45,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:27:56,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:45,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 497/2230 [1:28:42<3:27:07, 7.17s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:27:59,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 497/2230 [1:28:42<3:27:07, 7.17s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:27:59,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:28:02,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:59,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:28:02,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:59,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:28:06,308 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:59,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:28:09,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:27:59,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 498/2230 [1:28:56<4:23:26, 9.13s/it] Setting `use_cache=False`...1] 2022-03-23 18:27:59,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 498/2230 [1:28:56<4:23:26, 9.13s/it] Setting `use_cache=False`...1] 2022-03-23 18:27:59,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 498/2230 [1:28:56<4:23:26, 9.13s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:28:13,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:28:16,567 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:28:13,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:28:16,567 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:28:13,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:28:19,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:28:13,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:28:19,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:28:13,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:28:23,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:28:13,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:28:23,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:28:13,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|█████████████████ | 499/2230 [1:29:10<5:01:07, 10.44s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:28:26,672 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|█████████████████ | 499/2230 [1:29:10<5:01:07, 10.44s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:28:26,672 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:28:29,986 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:28:26,672 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:28:29,986 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:28:26,672 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:28:33,297 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:28:26,672 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:28:36,610 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:28:26,672 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:28:36,610 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:28:26,672 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-23 18:28:36,610 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:28:26,672 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 03/23/2022 18:38:22 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow {'eval_loss': 4.444121360778809, 'eval_wer': 1.7011503371677905, 'eval_runtime': 583.2794, 'eval_samples_per_second': 4.53, 'eval_steps_per_second': 0.567, 'epoch': 1.12} 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 03/23/2022 18:39:36 - WARNING - huggingface_hub.repository - Adding files tracked by Git LFS: ['wandb/run-20220323_165914-1vl16ira/run-1vl16ira.wandb']. This may take a bit of time if the files are large. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.414, 'learning_rate': 0.0002988, 'epoch': 1.12} 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.4024, 'learning_rate': 0.00029939999999999996, 'epoch': 1.13} 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.3597, 'learning_rate': 0.0003, 'epoch': 1.13} 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 22%|█████████████████ | 500/2230 [1:29:23<5:29:27, 11.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.1466, 'learning_rate': 0.00029982658959537567, 'epoch': 1.13} [INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|████████████████▉ | 505/2230 [1:41:52<29:54:44, 62.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|████████████████▉ | 505/2230 [1:41:52<29:54:44, 62.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.204, 'learning_rate': 0.0002996531791907514, 'epoch': 1.13} 23%|████████████████▉ | 505/2230 [1:41:52<29:54:44, 62.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|████████████████▉ | 505/2230 [1:41:52<29:54:44, 62.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|████████████████▉ | 505/2230 [1:41:52<29:54:44, 62.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|████████████████▉ | 505/2230 [1:41:52<29:54:44, 62.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|████████████████▉ | 505/2230 [1:41:52<29:54:44, 62.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|████████████████▉ | 505/2230 [1:41:52<29:54:44, 62.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|████████████████▉ | 505/2230 [1:41:52<29:54:44, 62.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.296, 'learning_rate': 0.00029947976878612716, 'epoch': 1.13} 23%|████████████████▉ | 505/2230 [1:41:52<29:54:44, 62.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|████████████████▉ | 505/2230 [1:41:52<29:54:44, 62.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|████████████████▉ | 505/2230 [1:41:52<29:54:44, 62.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|████████████████▉ | 505/2230 [1:41:52<29:54:44, 62.43s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.1418, 'learning_rate': 0.00029930635838150286, 'epoch': 1.14} 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.3272, 'learning_rate': 0.0002991329479768786, 'epoch': 1.14} 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.1266, 'learning_rate': 0.0002989595375722543, 'epoch': 1.14} 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.1107, 'learning_rate': 0.00029878612716763005, 'epoch': 1.14} 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.0438, 'learning_rate': 0.00029861271676300574, 'epoch': 1.15} 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.0974, 'learning_rate': 0.0002984393063583815, 'epoch': 1.15} 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.07, 'learning_rate': 0.0002982658959537572, 'epoch': 1.15} 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.9892, 'learning_rate': 0.00029809248554913293, 'epoch': 1.15} 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.0635, 'learning_rate': 0.0002979190751445086, 'epoch': 1.15} 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████ | 507/2230 [1:42:19<17:57:16, 37.51s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.0718, 'learning_rate': 0.00029774566473988437, 'epoch': 1.16} 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.99, 'learning_rate': 0.00029757225433526006, 'epoch': 1.16} 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.0126, 'learning_rate': 0.0002973988439306358, 'epoch': 1.16} 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.0171, 'learning_rate': 0.00029722543352601156, 'epoch': 1.16} 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.0175, 'learning_rate': 0.00029705202312138725, 'epoch': 1.17} 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▌ | 516/2230 [1:44:14<6:24:47, 13.47s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▊ | 521/2230 [1:45:13<5:45:26, 12.13s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▊ | 521/2230 [1:45:13<5:45:26, 12.13s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.9543, 'learning_rate': 0.00029687861271676295, 'epoch': 1.17} 23%|█████████████████▊ | 521/2230 [1:45:13<5:45:26, 12.13s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▊ | 521/2230 [1:45:13<5:45:26, 12.13s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▊ | 521/2230 [1:45:13<5:45:26, 12.13s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▊ | 521/2230 [1:45:13<5:45:26, 12.13s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▊ | 521/2230 [1:45:13<5:45:26, 12.13s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▊ | 521/2230 [1:45:13<5:45:26, 12.13s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.9275, 'learning_rate': 0.0002967052023121387, 'epoch': 1.17} 23%|█████████████████▊ | 521/2230 [1:45:13<5:45:26, 12.13s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▊ | 521/2230 [1:45:13<5:45:26, 12.13s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▊ | 521/2230 [1:45:13<5:45:26, 12.13s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▊ | 521/2230 [1:45:13<5:45:26, 12.13s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▊ | 521/2230 [1:45:13<5:45:26, 12.13s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.97, 'learning_rate': 0.00029653179190751444, 'epoch': 1.17} 23%|█████████████████▊ | 521/2230 [1:45:13<5:45:26, 12.13s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▊ | 521/2230 [1:45:13<5:45:26, 12.13s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▊ | 521/2230 [1:45:13<5:45:26, 12.13s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▊ | 521/2230 [1:45:13<5:45:26, 12.13s/it][INFO|trainer.py:560] 2022-03-23 18:28:39,398 >> The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▊ | 524/2230 [1:45:48<5:32:37, 11.70s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 23%|█████████████████▊ | 524/2230 [1:45:48<5:32:37, 11.70s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.0399, 'learning_rate': 0.00029635838150289014, 'epoch': 1.17} 23%|█████████████████▊ | 524/2230 [1:45:48<5:32:37, 11.70s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:45:11,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:45:11,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:45:11,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:45:11,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:45:11,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.8694, 'learning_rate': 0.00029618497109826583, 'epoch': 1.18} [WARNING|modeling_utils.py:388] 2022-03-23 18:45:11,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:45:11,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:45:25,475 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:45:25,475 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:45:25,475 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.8722, 'learning_rate': 0.0002960115606936416, 'epoch': 1.18} [WARNING|modeling_utils.py:388] 2022-03-23 18:45:25,475 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:45:25,475 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:45:25,475 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:45:37,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:45:37,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:45:37,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.8752, 'learning_rate': 0.0002958381502890173, 'epoch': 1.18} [WARNING|modeling_utils.py:388] 2022-03-23 18:45:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:45:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:45:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:45:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.9737, 'learning_rate': 0.000295664739884393, 'epoch': 1.18} [WARNING|modeling_utils.py:388] 2022-03-23 18:45:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:45:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:45:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:45:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:45:43,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 24%|██████████████████ | 529/2230 [1:46:44<5:14:59, 11.11s/it]g-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 24%|██████████████████ | 529/2230 [1:46:44<5:14:59, 11.11s/it]g-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 24%|██████████████████ | 529/2230 [1:46:44<5:14:59, 11.11s/it]g-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 24%|██████████████████ | 529/2230 [1:46:44<5:14:59, 11.11s/it]g-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:46:07,472 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:46:07,472 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.8993, 'learning_rate': 0.0002953179190751445, 'epoch': 1.19} [WARNING|modeling_bart.py:1051] 2022-03-23 18:46:12,249 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 18:46:12,249 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 18:46:12,249 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:46:17,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:46:17,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:46:17,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.9065, 'learning_rate': 0.0002951445086705202, 'epoch': 1.19} [WARNING|modeling_utils.py:388] 2022-03-23 18:46:17,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 18:46:26,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 18:46:26,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 18:46:26,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 18:46:26,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.0046, 'learning_rate': 0.0002949710982658959, 'epoch': 1.19} [WARNING|modeling_utils.py:388] 2022-03-23 18:46:34,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:46:34,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:46:34,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:46:34,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:46:40,420 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:46:40,420 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 18:46:44,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 18:46:44,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:46:48,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:46:48,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.9889, 'learning_rate': 0.0002946242774566474, 'epoch': 1.2} [WARNING|modeling_utils.py:388] 2022-03-23 18:46:48,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:46:54,896 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:46:57,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:46:57,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:46:57,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:00,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:00,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 18:47:04,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 18:47:06,934 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:45:04,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 24%|██████████████████▎ | 536/2230 [1:47:52<4:31:52, 9.63s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:47:09,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 24%|██████████████████▎ | 536/2230 [1:47:52<4:31:52, 9.63s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:47:09,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 18:47:11,348 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:47:09,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 18:47:13,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:47:09,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 18:47:15,568 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:47:09,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 24%|██████████████████▎ | 537/2230 [1:48:01<4:22:42, 9.31s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 24%|██████████████████▎ | 537/2230 [1:48:01<4:22:42, 9.31s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 18:47:19,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 18:47:19,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:23,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:25,444 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:25,444 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:27,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:29,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:31,134 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:32,913 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:32,913 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:34,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:38,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:39,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:39,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:41,361 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:42,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:45,798 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:45,798 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:47,263 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:49,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:51,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:51,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:53,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:56,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:56,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:47:58,252 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:00,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:00,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:02,156 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:03,856 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:03,856 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:06,293 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:06,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:06,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.927, 'learning_rate': 0.00029254335260115604, 'epoch': 1.22} [WARNING|modeling_utils.py:388] 2022-03-23 18:48:10,721 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:14,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:14,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:17,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:17,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:21,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:21,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 5.8096, 'learning_rate': 0.0002923699421965318, 'epoch': 1.23} [WARNING|modeling_utils.py:388] 2022-03-23 18:48:24,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:28,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:28,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:31,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:31,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:31,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:35,180 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:38,728 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:38,728 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:42,141 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:42,141 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:45,545 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:48,920 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:48,920 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.7905, 'learning_rate': 0.0002920231213872832, 'epoch': 1.23} [WARNING|modeling_utils.py:388] 2022-03-23 18:48:52,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:52,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:55,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:59,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:48:59,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.6619, 'learning_rate': 0.0002918497109826589, 'epoch': 1.23} [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.4531, 'learning_rate': 0.0002916763005780347, 'epoch': 1.24} [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.3329, 'learning_rate': 0.00029150289017341037, 'epoch': 1.24} [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.4172, 'learning_rate': 0.0002913294797687861, 'epoch': 1.24} [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.2514, 'learning_rate': 0.00029115606936416186, 'epoch': 1.24} [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.2204, 'learning_rate': 0.00029098265895953756, 'epoch': 1.24} [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.0057, 'learning_rate': 0.00029080924855491325, 'epoch': 1.25} [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.0476, 'learning_rate': 0.000290635838150289, 'epoch': 1.25} [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.0568, 'learning_rate': 0.00029046242774566475, 'epoch': 1.25} [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.0171, 'learning_rate': 0.00029028901734104044, 'epoch': 1.25} [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.0169, 'learning_rate': 0.00029011560693641613, 'epoch': 1.26} [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.0429, 'learning_rate': 0.0002899421965317919, 'epoch': 1.26} [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.8805, 'learning_rate': 0.00028976878612716763, 'epoch': 1.26} [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:49:02,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▏ | 563/2230 [1:52:35<5:52:06, 12.67s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▏ | 563/2230 [1:52:35<5:52:06, 12.67s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.0099, 'learning_rate': 0.0002895953757225433, 'epoch': 1.26} 25%|███████████████████▏ | 563/2230 [1:52:35<5:52:06, 12.67s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▏ | 563/2230 [1:52:35<5:52:06, 12.67s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▏ | 563/2230 [1:52:35<5:52:06, 12.67s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▏ | 563/2230 [1:52:35<5:52:06, 12.67s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▏ | 564/2230 [1:52:47<5:49:28, 12.59s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▏ | 564/2230 [1:52:47<5:49:28, 12.59s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.8666, 'learning_rate': 0.00028942196531791907, 'epoch': 1.26} 25%|███████████████████▏ | 564/2230 [1:52:47<5:49:28, 12.59s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▏ | 564/2230 [1:52:47<5:49:28, 12.59s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▏ | 564/2230 [1:52:47<5:49:28, 12.59s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▏ | 564/2230 [1:52:47<5:49:28, 12.59s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.8508, 'learning_rate': 0.00028924855491329476, 'epoch': 1.27} 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.8488, 'learning_rate': 0.0002890751445086705, 'epoch': 1.27} 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.8655, 'learning_rate': 0.0002889017341040462, 'epoch': 1.27} 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.0298, 'learning_rate': 0.00028872832369942195, 'epoch': 1.27} 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.8001, 'learning_rate': 0.00028855491329479765, 'epoch': 1.28} 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.8585, 'learning_rate': 0.0002883815028901734, 'epoch': 1.28} 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 25%|███████████████████▎ | 565/2230 [1:52:59<5:46:08, 12.47s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▍ | 571/2230 [1:54:11<5:30:12, 11.94s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▍ | 571/2230 [1:54:11<5:30:12, 11.94s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▍ | 571/2230 [1:54:11<5:30:12, 11.94s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▍ | 571/2230 [1:54:11<5:30:12, 11.94s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▍ | 571/2230 [1:54:11<5:30:12, 11.94s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▍ | 571/2230 [1:54:11<5:30:12, 11.94s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▍ | 571/2230 [1:54:11<5:30:12, 11.94s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.9276, 'learning_rate': 0.00028803468208092484, 'epoch': 1.28} 26%|███████████████████▍ | 571/2230 [1:54:11<5:30:12, 11.94s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▍ | 571/2230 [1:54:11<5:30:12, 11.94s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▍ | 571/2230 [1:54:11<5:30:12, 11.94s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▍ | 571/2230 [1:54:11<5:30:12, 11.94s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▍ | 571/2230 [1:54:11<5:30:12, 11.94s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▍ | 571/2230 [1:54:11<5:30:12, 11.94s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 18:53:52,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 18:53:52,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 18:53:52,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:53:58,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:53:58,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:53:58,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.7575, 'learning_rate': 0.0002876878612716763, 'epoch': 1.29} [WARNING|modeling_utils.py:388] 2022-03-23 18:53:58,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:53:58,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:53:58,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:53:58,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:53:58,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:53:58,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.7872, 'learning_rate': 0.000287514450867052, 'epoch': 1.29} [WARNING|modeling_utils.py:388] 2022-03-23 18:53:58,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:53:58,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:53:58,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:53:58,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▋ | 576/2230 [1:55:08<5:17:28, 11.52s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▋ | 576/2230 [1:55:08<5:17:28, 11.52s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.8801, 'learning_rate': 0.0002873410404624277, 'epoch': 1.29} 26%|███████████████████▋ | 576/2230 [1:55:08<5:17:28, 11.52s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▋ | 576/2230 [1:55:08<5:17:28, 11.52s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▋ | 576/2230 [1:55:08<5:17:28, 11.52s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▋ | 576/2230 [1:55:08<5:17:28, 11.52s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▋ | 576/2230 [1:55:08<5:17:28, 11.52s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.9001, 'learning_rate': 0.0002871676300578034, 'epoch': 1.29} 26%|███████████████████▋ | 576/2230 [1:55:08<5:17:28, 11.52s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▋ | 576/2230 [1:55:08<5:17:28, 11.52s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▋ | 576/2230 [1:55:08<5:17:28, 11.52s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▋ | 576/2230 [1:55:08<5:17:28, 11.52s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▋ | 576/2230 [1:55:08<5:17:28, 11.52s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▋ | 576/2230 [1:55:08<5:17:28, 11.52s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.7872, 'learning_rate': 0.00028699421965317916, 'epoch': 1.3} [WARNING|modeling_utils.py:388] 2022-03-23 18:54:51,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:54:51,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:54:51,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:54:51,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:54:51,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.7781, 'learning_rate': 0.0002868208092485549, 'epoch': 1.3} [WARNING|modeling_utils.py:388] 2022-03-23 18:54:51,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:54:51,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:55:06,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:55:06,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.8367, 'learning_rate': 0.0002866473988439306, 'epoch': 1.3} [WARNING|modeling_utils.py:388] 2022-03-23 18:55:06,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:55:06,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:55:14,080 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:55:14,080 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:55:14,080 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▊ | 581/2230 [1:56:02<4:54:23, 10.71s/it]g-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:55:20,501 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:55:20,501 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:55:20,501 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:55:26,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:55:26,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.8266, 'learning_rate': 0.0002863005780346821, 'epoch': 1.3} [WARNING|modeling_utils.py:388] 2022-03-23 18:55:26,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:55:26,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:55:26,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:55:36,628 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:55:36,628 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.8546, 'learning_rate': 0.0002861271676300578, 'epoch': 1.31} [WARNING|modeling_utils.py:388] 2022-03-23 18:55:36,628 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:55:42,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:55:45,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:55:45,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:55:45,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.6754, 'learning_rate': 0.0002859537572254335, 'epoch': 1.31} [WARNING|modeling_utils.py:388] 2022-03-23 18:55:50,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:55:53,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:55:53,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:55:53,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:47:17,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▉ | 585/2230 [1:56:41<4:28:37, 9.80s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 26%|███████████████████▉ | 585/2230 [1:56:41<4:28:37, 9.80s/it][WARNING|modeling_bart.py:1051] 2022-03-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:00,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:03,080 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:05,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:05,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:07,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:09,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:11,356 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:13,323 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:13,323 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:15,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:17,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:19,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:21,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:21,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:23,208 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:24,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:26,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:26,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:28,269 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:31,566 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:33,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:33,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:34,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:37,629 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:38,974 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:38,974 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:41,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:43,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:45,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:45,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:47,945 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:50,092 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:50,092 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:51,222 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:54,134 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:54,134 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:55,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:57,688 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:57,688 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:59,349 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:56:59,349 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:00,729 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:00,729 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:04,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:04,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:08,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:11,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:11,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:11,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:15,113 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:15,113 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:18,676 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:22,156 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:22,156 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:25,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:25,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:25,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:29,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:29,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:32,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:35,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:35,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:39,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:42,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:42,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.943, 'learning_rate': 0.0002833526011560693, 'epoch': 1.34} [WARNING|modeling_utils.py:388] 2022-03-23 18:57:46,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:46,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:49,306 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:52,629 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:52,629 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:56,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:56,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.7358, 'learning_rate': 0.00028317919075144507, 'epoch': 1.35} [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.3826, 'learning_rate': 0.00028300578034682076, 'epoch': 1.35} [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.3389, 'learning_rate': 0.0002828323699421965, 'epoch': 1.35} [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.2211, 'learning_rate': 0.00028265895953757226, 'epoch': 1.35} [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.1059, 'learning_rate': 0.00028248554913294795, 'epoch': 1.35} [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 18:57:59,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.0504, 'learning_rate': 0.00028231213872832365, 'epoch': 1.36} 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.958, 'learning_rate': 0.0002821387283236994, 'epoch': 1.36} 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.0532, 'learning_rate': 0.00028196531791907514, 'epoch': 1.36} 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.9602, 'learning_rate': 0.00028179190751445083, 'epoch': 1.36} 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.979, 'learning_rate': 0.0002816184971098266, 'epoch': 1.37} 27%|████████████████████▌ | 605/2230 [1:59:46<5:48:06, 12.85s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:00:00,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:00:00,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:00:00,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▊ | 610/2230 [2:00:50<5:43:29, 12.72s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▊ | 610/2230 [2:00:50<5:43:29, 12.72s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.8953, 'learning_rate': 0.0002814450867052023, 'epoch': 1.37} 27%|████████████████████▊ | 610/2230 [2:00:50<5:43:29, 12.72s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▊ | 610/2230 [2:00:50<5:43:29, 12.72s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▊ | 610/2230 [2:00:50<5:43:29, 12.72s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▊ | 610/2230 [2:00:50<5:43:29, 12.72s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▊ | 611/2230 [2:01:02<5:41:03, 12.64s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▊ | 611/2230 [2:01:02<5:41:03, 12.64s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.8389, 'learning_rate': 0.000281271676300578, 'epoch': 1.37} 27%|████████████████████▊ | 611/2230 [2:01:02<5:41:03, 12.64s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▊ | 611/2230 [2:01:02<5:41:03, 12.64s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▊ | 611/2230 [2:01:02<5:41:03, 12.64s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 27%|████████████████████▊ | 611/2230 [2:01:02<5:41:03, 12.64s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.8168, 'learning_rate': 0.0002810982658959537, 'epoch': 1.37} Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.8634, 'learning_rate': 0.00028092485549132947, 'epoch': 1.37} Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.8456, 'learning_rate': 0.00028075144508670516, 'epoch': 1.38} Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.9215, 'learning_rate': 0.0002805780346820809, 'epoch': 1.38} Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.7437, 'learning_rate': 0.0002804046242774566, 'epoch': 1.38} Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.7972, 'learning_rate': 0.00028023121387283235, 'epoch': 1.38} Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████ | 618/2230 [2:02:29<5:27:30, 12.19s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████ | 618/2230 [2:02:29<5:27:30, 12.19s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.6682, 'learning_rate': 0.00028005780346820804, 'epoch': 1.39} 28%|█████████████████████ | 618/2230 [2:02:29<5:27:30, 12.19s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████ | 618/2230 [2:02:29<5:27:30, 12.19s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████ | 618/2230 [2:02:29<5:27:30, 12.19s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████ | 618/2230 [2:02:29<5:27:30, 12.19s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████ | 618/2230 [2:02:29<5:27:30, 12.19s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.7301, 'learning_rate': 0.0002798843930635838, 'epoch': 1.39} [WARNING|modeling_utils.py:388] 2022-03-23 19:01:59,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:01:59,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:01:59,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:01:59,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:01:59,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:01:59,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.663, 'learning_rate': 0.00027971098265895954, 'epoch': 1.39} [WARNING|modeling_utils.py:388] 2022-03-23 19:01:59,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:01:59,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:01:59,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:01:59,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:01:59,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:01:59,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.7823, 'learning_rate': 0.00027953757225433523, 'epoch': 1.39} [WARNING|modeling_utils.py:388] 2022-03-23 19:01:59,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:01:59,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:01:59,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:01:59,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▏ | 622/2230 [2:03:16<5:17:22, 11.84s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▏ | 622/2230 [2:03:16<5:17:22, 11.84s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.763, 'learning_rate': 0.0002793641618497109, 'epoch': 1.39} 28%|█████████████████████▏ | 622/2230 [2:03:16<5:17:22, 11.84s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▏ | 622/2230 [2:03:16<5:17:22, 11.84s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▏ | 622/2230 [2:03:16<5:17:22, 11.84s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▏ | 622/2230 [2:03:16<5:17:22, 11.84s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▏ | 622/2230 [2:03:16<5:17:22, 11.84s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.696, 'learning_rate': 0.00027919075144508667, 'epoch': 1.4} 28%|█████████████████████▏ | 622/2230 [2:03:16<5:17:22, 11.84s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▏ | 622/2230 [2:03:16<5:17:22, 11.84s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▏ | 622/2230 [2:03:16<5:17:22, 11.84s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▏ | 622/2230 [2:03:16<5:17:22, 11.84s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:02:54,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:02:54,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.6394, 'learning_rate': 0.0002790173410404624, 'epoch': 1.4} [WARNING|modeling_utils.py:388] 2022-03-23 19:02:54,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:02:54,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:02:54,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:02:54,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▎ | 625/2230 [2:03:50<5:12:58, 11.70s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▎ | 625/2230 [2:03:50<5:12:58, 11.70s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.6472, 'learning_rate': 0.0002788439306358381, 'epoch': 1.4} 28%|█████████████████████▎ | 625/2230 [2:03:50<5:12:58, 11.70s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▎ | 625/2230 [2:03:50<5:12:58, 11.70s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▎ | 625/2230 [2:03:50<5:12:58, 11.70s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▎ | 625/2230 [2:03:50<5:12:58, 11.70s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▎ | 625/2230 [2:03:50<5:12:58, 11.70s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.6325, 'learning_rate': 0.00027867052023121386, 'epoch': 1.4} 28%|█████████████████████▎ | 625/2230 [2:03:50<5:12:58, 11.70s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▎ | 625/2230 [2:03:50<5:12:58, 11.70s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▎ | 625/2230 [2:03:50<5:12:58, 11.70s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▎ | 625/2230 [2:03:50<5:12:58, 11.70s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▎ | 627/2230 [2:04:13<5:06:18, 11.47s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▎ | 627/2230 [2:04:13<5:06:18, 11.47s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.5473, 'learning_rate': 0.00027849710982658955, 'epoch': 1.41} 28%|█████████████████████▎ | 627/2230 [2:04:13<5:06:18, 11.47s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▎ | 627/2230 [2:04:13<5:06:18, 11.47s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▎ | 627/2230 [2:04:13<5:06:18, 11.47s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▎ | 627/2230 [2:04:13<5:06:18, 11.47s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▎ | 627/2230 [2:04:13<5:06:18, 11.47s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.6429, 'learning_rate': 0.0002783236994219653, 'epoch': 1.41} 28%|█████████████████████▎ | 627/2230 [2:04:13<5:06:18, 11.47s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▎ | 627/2230 [2:04:13<5:06:18, 11.47s/it]g-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:03:47,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:03:47,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:03:47,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:03:47,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.6473, 'learning_rate': 0.000278150289017341, 'epoch': 1.41} [WARNING|modeling_bart.py:1051] 2022-03-23 19:03:47,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:03:47,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:03:47,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▍ | 630/2230 [2:04:46<4:54:26, 11.04s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▍ | 630/2230 [2:04:46<4:54:26, 11.04s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.5185, 'learning_rate': 0.00027797687861271674, 'epoch': 1.41} 28%|█████████████████████▍ | 630/2230 [2:04:46<4:54:26, 11.04s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▍ | 630/2230 [2:04:46<4:54:26, 11.04s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▍ | 630/2230 [2:04:46<4:54:26, 11.04s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:04:11,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:04:11,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.7047, 'learning_rate': 0.0002778034682080925, 'epoch': 1.41} [WARNING|modeling_utils.py:388] 2022-03-23 19:04:11,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:04:11,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:04:11,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:04:22,125 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:04:22,125 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.467, 'learning_rate': 0.0002776300578034682, 'epoch': 1.42} [WARNING|modeling_utils.py:388] 2022-03-23 19:04:22,125 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:04:22,125 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:04:30,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:04:30,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:04:30,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.4672, 'learning_rate': 0.0002774566473988439, 'epoch': 1.42} [WARNING|modeling_bart.py:1051] 2022-03-23 19:04:30,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:04:30,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:04:40,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▌ | 634/2230 [2:05:26<4:31:58, 10.22s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 28%|█████████████████████▌ | 634/2230 [2:05:26<4:31:58, 10.22s/it] Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:04:44,236 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:04:44,236 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:04:48,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:04:48,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:04:52,401 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:04:52,401 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:04:54,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:04:56,894 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:04:56,894 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 18:55:57,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|█████████████████████▋ | 636/2230 [2:05:44<4:17:43, 9.70s/it][WARNING|modeling_bart.py:1051] 2022-03-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|█████████████████████▋ | 636/2230 [2:05:44<4:17:43, 9.70s/it][WARNING|modeling_bart.py:1051] 2022-03-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.2793, 'learning_rate': 0.00027693641618497107, 'epoch': 1.43} [WARNING|modeling_utils.py:388] 2022-03-23 19:05:04,725 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:05:06,856 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:05:08,949 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:05:08,949 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:05:11,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:05:11,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:15,138 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:17,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:19,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:19,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:21,004 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:22,826 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:24,604 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:24,604 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:26,452 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:28,160 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:31,422 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:31,422 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:33,081 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:34,620 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:36,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:36,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:39,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:40,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:43,227 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:43,227 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:44,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:47,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:47,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:49,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:50,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:50,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:52,730 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:55,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:55,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:57,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:59,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:59,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:05:59,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:02,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:02,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:06,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:09,952 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:09,952 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:13,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:13,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 5.8762, 'learning_rate': 0.000275028901734104, 'epoch': 1.45} [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:16,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:20,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:20,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:23,790 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:23,790 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:23,790 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:27,182 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:30,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:30,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:34,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:34,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:37,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:40,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:40,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 5.1378, 'learning_rate': 0.00027468208092485546, 'epoch': 1.46} [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:44,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:44,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:47,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:50,762 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:50,762 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.6691, 'learning_rate': 0.00027450867052023116, 'epoch': 1.46} [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.1925, 'learning_rate': 0.0002743352601156069, 'epoch': 1.46} [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.9882, 'learning_rate': 0.00027416184971098265, 'epoch': 1.46} [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.879, 'learning_rate': 0.00027398843930635835, 'epoch': 1.46} [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:06:54,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.8319, 'learning_rate': 0.00027364161849710984, 'epoch': 1.47} 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.7363, 'learning_rate': 0.00027346820809248554, 'epoch': 1.47} 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.6597, 'learning_rate': 0.00027329479768786123, 'epoch': 1.47} 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 29%|██████████████████████▎ | 654/2230 [2:08:32<5:36:37, 12.82s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▍ | 658/2230 [2:09:24<5:37:14, 12.87s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▍ | 658/2230 [2:09:24<5:37:14, 12.87s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.6444, 'learning_rate': 0.000273121387283237, 'epoch': 1.48} 30%|██████████████████████▍ | 658/2230 [2:09:24<5:37:14, 12.87s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▍ | 658/2230 [2:09:24<5:37:14, 12.87s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▍ | 658/2230 [2:09:24<5:37:14, 12.87s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▍ | 658/2230 [2:09:24<5:37:14, 12.87s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▍ | 658/2230 [2:09:24<5:37:14, 12.87s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▍ | 658/2230 [2:09:24<5:37:14, 12.87s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.5363, 'learning_rate': 0.0002729479768786127, 'epoch': 1.48} [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.4726, 'learning_rate': 0.0002727745664739884, 'epoch': 1.48} [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.3294, 'learning_rate': 0.0002726011560693641, 'epoch': 1.48} [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.4765, 'learning_rate': 0.00027242774566473986, 'epoch': 1.48} [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.3104, 'learning_rate': 0.0002722543352601156, 'epoch': 1.49} [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:08:56,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.3259, 'learning_rate': 0.0002720809248554913, 'epoch': 1.49} [WARNING|modeling_utils.py:388] 2022-03-23 19:10:00,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:10:00,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:10:00,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:10:00,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▋ | 665/2230 [2:10:52<5:25:33, 12.48s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▋ | 665/2230 [2:10:52<5:25:33, 12.48s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.1058, 'learning_rate': 0.00027190751445086705, 'epoch': 1.49} 30%|██████████████████████▋ | 665/2230 [2:10:52<5:25:33, 12.48s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▋ | 665/2230 [2:10:52<5:25:33, 12.48s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▋ | 665/2230 [2:10:52<5:25:33, 12.48s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▋ | 665/2230 [2:10:52<5:25:33, 12.48s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▋ | 665/2230 [2:10:52<5:25:33, 12.48s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▋ | 665/2230 [2:10:52<5:25:33, 12.48s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.2575, 'learning_rate': 0.00027173410404624274, 'epoch': 1.49} 30%|██████████████████████▋ | 665/2230 [2:10:52<5:25:33, 12.48s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▋ | 665/2230 [2:10:52<5:25:33, 12.48s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:10:28,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:10:28,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:10:28,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.0172, 'learning_rate': 0.0002715606936416185, 'epoch': 1.5} [WARNING|modeling_utils.py:388] 2022-03-23 19:10:28,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:10:28,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:10:28,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:10:28,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:10:28,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:10:28,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.0809, 'learning_rate': 0.0002713872832369942, 'epoch': 1.5} [WARNING|modeling_utils.py:388] 2022-03-23 19:10:28,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:10:28,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:10:51,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:10:51,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:10:51,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:10:51,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.1742, 'learning_rate': 0.00027121387283236993, 'epoch': 1.5} [WARNING|modeling_utils.py:388] 2022-03-23 19:10:51,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:10:51,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:10:51,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:10:51,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▊ | 670/2230 [2:11:52<5:13:19, 12.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▊ | 670/2230 [2:11:52<5:13:19, 12.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.0458, 'learning_rate': 0.0002710404624277456, 'epoch': 1.5} 30%|██████████████████████▊ | 670/2230 [2:11:52<5:13:19, 12.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▊ | 670/2230 [2:11:52<5:13:19, 12.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▊ | 670/2230 [2:11:52<5:13:19, 12.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▊ | 670/2230 [2:11:52<5:13:19, 12.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▊ | 670/2230 [2:11:52<5:13:19, 12.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▊ | 670/2230 [2:11:52<5:13:19, 12.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.9033, 'learning_rate': 0.0002708670520231214, 'epoch': 1.5} 30%|██████████████████████▊ | 670/2230 [2:11:52<5:13:19, 12.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▊ | 670/2230 [2:11:52<5:13:19, 12.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▊ | 670/2230 [2:11:52<5:13:19, 12.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▊ | 670/2230 [2:11:52<5:13:19, 12.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▊ | 670/2230 [2:11:52<5:13:19, 12.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.9879, 'learning_rate': 0.00027069364161849707, 'epoch': 1.51} 30%|██████████████████████▊ | 670/2230 [2:11:52<5:13:19, 12.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▊ | 670/2230 [2:11:52<5:13:19, 12.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▊ | 670/2230 [2:11:52<5:13:19, 12.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▊ | 670/2230 [2:11:52<5:13:19, 12.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▊ | 670/2230 [2:11:52<5:13:19, 12.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▊ | 670/2230 [2:11:52<5:13:19, 12.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:11:45,241 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:11:45,241 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:11:45,241 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:11:45,241 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:11:45,241 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▉ | 674/2230 [2:12:38<5:04:13, 11.73s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▉ | 674/2230 [2:12:38<5:04:13, 11.73s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.786, 'learning_rate': 0.0002703468208092485, 'epoch': 1.51} 30%|██████████████████████▉ | 674/2230 [2:12:38<5:04:13, 11.73s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▉ | 674/2230 [2:12:38<5:04:13, 11.73s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▉ | 674/2230 [2:12:38<5:04:13, 11.73s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▉ | 674/2230 [2:12:38<5:04:13, 11.73s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▉ | 674/2230 [2:12:38<5:04:13, 11.73s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▉ | 674/2230 [2:12:38<5:04:13, 11.73s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.9336, 'learning_rate': 0.00027017341040462426, 'epoch': 1.51} 30%|██████████████████████▉ | 674/2230 [2:12:38<5:04:13, 11.73s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▉ | 674/2230 [2:12:38<5:04:13, 11.73s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|██████████████████████▉ | 674/2230 [2:12:38<5:04:13, 11.73s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|███████████████████████ | 676/2230 [2:13:02<5:01:14, 11.63s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|███████████████████████ | 676/2230 [2:13:02<5:01:14, 11.63s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.7786, 'learning_rate': 0.00027, 'epoch': 1.52} 30%|███████████████████████ | 676/2230 [2:13:02<5:01:14, 11.63s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|███████████████████████ | 676/2230 [2:13:02<5:01:14, 11.63s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|███████████████████████ | 676/2230 [2:13:02<5:01:14, 11.63s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|███████████████████████ | 676/2230 [2:13:02<5:01:14, 11.63s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|███████████████████████ | 676/2230 [2:13:02<5:01:14, 11.63s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.664, 'learning_rate': 0.0002698265895953757, 'epoch': 1.52} 30%|███████████████████████ | 676/2230 [2:13:02<5:01:14, 11.63s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|███████████████████████ | 676/2230 [2:13:02<5:01:14, 11.63s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:12:36,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:12:36,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:12:36,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|███████████████████████ | 678/2230 [2:13:24<4:53:31, 11.35s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|███████████████████████ | 678/2230 [2:13:24<4:53:31, 11.35s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|███████████████████████ | 678/2230 [2:13:24<4:53:31, 11.35s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|███████████████████████ | 678/2230 [2:13:24<4:53:31, 11.35s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 30%|███████████████████████ | 678/2230 [2:13:24<4:53:31, 11.35s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:12:50,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:12:50,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.6503, 'learning_rate': 0.00026947976878612714, 'epoch': 1.52} [WARNING|modeling_utils.py:388] 2022-03-23 19:12:54,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:12:54,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:12:54,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:12:54,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:12:54,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.7218, 'learning_rate': 0.0002693063583815029, 'epoch': 1.52} [WARNING|modeling_utils.py:388] 2022-03-23 19:12:54,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:12:54,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:12:54,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:12:54,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:12:54,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:13:12,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:13:12,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:13:12,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:13:12,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:13:21,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:13:21,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.5694, 'learning_rate': 0.00026895953757225433, 'epoch': 1.53} [WARNING|modeling_bart.py:1051] 2022-03-23 19:13:21,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:13:27,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:13:27,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:13:27,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:13:27,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.5128, 'learning_rate': 0.0002687861271676301, 'epoch': 1.53} [WARNING|modeling_bart.py:1051] 2022-03-23 19:13:27,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:13:37,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:13:37,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:13:41,607 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:13:41,607 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.7005, 'learning_rate': 0.0002687861271676301, 'epoch': 1.53} [WARNING|modeling_bart.py:1051] 2022-03-23 19:13:45,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:13:45,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:13:49,767 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:13:49,767 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:13:49,767 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:13:53,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:13:56,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:13:58,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▍ | 686/2230 [2:14:44<4:09:04, 9.68s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▍ | 686/2230 [2:14:44<4:09:04, 9.68s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.5138, 'learning_rate': 0.00026843930635838146, 'epoch': 1.54} [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:03,861 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:05,947 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:07,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:07,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:10,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:12,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:14,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:16,119 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:16,119 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:18,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:19,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:21,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:23,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:23,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:25,146 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:28,398 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:29,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:29,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:31,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:34,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:35,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:35,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:38,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:40,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:42,563 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:42,563 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:44,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:47,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:47,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:49,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:51,009 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:51,009 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:52,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:54,521 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:54,521 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:56,716 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:14:56,716 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.1921, 'learning_rate': 0.0002667052023121387, 'epoch': 1.56} [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:00,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:00,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:03,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:07,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:07,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:10,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:10,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 8.7366, 'learning_rate': 0.0002665317919075144, 'epoch': 1.56} [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:14,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:14,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:17,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:20,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:20,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:24,345 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:24,345 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 6.141, 'learning_rate': 0.00026635838150289016, 'epoch': 1.57} [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:27,851 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:31,227 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:31,227 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:34,628 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:37,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:37,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.6943, 'learning_rate': 0.00026618497109826586, 'epoch': 1.57} [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:41,412 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:41,412 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:44,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:48,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:51,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:51,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.7024, 'learning_rate': 0.00026601156069364155, 'epoch': 1.57} [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:55,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:55,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:55,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:55,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:55,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:15:55,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.0568, 'learning_rate': 0.00026566473988439305, 'epoch': 1.57} 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.0959, 'learning_rate': 0.00026549132947976874, 'epoch': 1.58} 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.9189, 'learning_rate': 0.0002653179190751445, 'epoch': 1.58} 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.7027, 'learning_rate': 0.00026514450867052024, 'epoch': 1.58} 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.7317, 'learning_rate': 0.00026497109826589593, 'epoch': 1.58} 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.5358, 'learning_rate': 0.0002647976878612716, 'epoch': 1.59} 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.3571, 'learning_rate': 0.00026462427745664737, 'epoch': 1.59} 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.3926, 'learning_rate': 0.0002644508670520231, 'epoch': 1.59} 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 31%|███████████████████████▉ | 701/2230 [2:16:50<5:08:09, 12.09s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▏ | 710/2230 [2:18:47<5:24:05, 12.79s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▏ | 710/2230 [2:18:47<5:24:05, 12.79s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.1589, 'learning_rate': 0.0002642774566473988, 'epoch': 1.59} 32%|████████████████████████▏ | 710/2230 [2:18:47<5:24:05, 12.79s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▏ | 710/2230 [2:18:47<5:24:05, 12.79s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▏ | 710/2230 [2:18:47<5:24:05, 12.79s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▏ | 710/2230 [2:18:47<5:24:05, 12.79s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.0983, 'learning_rate': 0.00026410404624277456, 'epoch': 1.59} Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.9613, 'learning_rate': 0.00026393063583815025, 'epoch': 1.6} Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.0088, 'learning_rate': 0.000263757225433526, 'epoch': 1.6} Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:18:50,183 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:18:50,183 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:18:50,183 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.0606, 'learning_rate': 0.0002635838150289017, 'epoch': 1.6} [WARNING|modeling_utils.py:388] 2022-03-23 19:18:50,183 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:18:50,183 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:18:50,183 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:18:50,183 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:18:50,183 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:18:50,183 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.9965, 'learning_rate': 0.00026341040462427744, 'epoch': 1.6} [WARNING|modeling_utils.py:388] 2022-03-23 19:19:08,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:19:08,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:19:08,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:19:08,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:19:08,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:19:08,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.8923, 'learning_rate': 0.00026323699421965314, 'epoch': 1.61} [WARNING|modeling_utils.py:388] 2022-03-23 19:19:08,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:19:08,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:19:08,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:19:08,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:19:08,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:19:08,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.83, 'learning_rate': 0.0002630635838150289, 'epoch': 1.61} [WARNING|modeling_utils.py:388] 2022-03-23 19:19:08,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:19:08,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:19:08,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:19:08,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:19:08,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:19:08,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.8531, 'learning_rate': 0.0002628901734104046, 'epoch': 1.61} [WARNING|modeling_utils.py:388] 2022-03-23 19:19:08,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:19:08,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:19:08,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:19:08,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▌ | 719/2230 [2:20:38<5:06:49, 12.18s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▌ | 719/2230 [2:20:38<5:06:49, 12.18s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.6699, 'learning_rate': 0.0002627167630057803, 'epoch': 1.61} 32%|████████████████████████▌ | 719/2230 [2:20:38<5:06:49, 12.18s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▌ | 719/2230 [2:20:38<5:06:49, 12.18s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▌ | 719/2230 [2:20:38<5:06:49, 12.18s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▌ | 719/2230 [2:20:38<5:06:49, 12.18s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▌ | 720/2230 [2:20:50<5:04:49, 12.11s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▌ | 720/2230 [2:20:50<5:04:49, 12.11s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.6279, 'learning_rate': 0.000262543352601156, 'epoch': 1.61} 32%|████████████████████████▌ | 720/2230 [2:20:50<5:04:49, 12.11s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▌ | 720/2230 [2:20:50<5:04:49, 12.11s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▌ | 720/2230 [2:20:50<5:04:49, 12.11s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▌ | 720/2230 [2:20:50<5:04:49, 12.11s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▌ | 720/2230 [2:20:50<5:04:49, 12.11s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▌ | 720/2230 [2:20:50<5:04:49, 12.11s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:20:20,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:20:20,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:20:20,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:20:20,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:20:20,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:20:20,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.6376, 'learning_rate': 0.0002621965317919075, 'epoch': 1.62} [WARNING|modeling_utils.py:388] 2022-03-23 19:20:20,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:20:35,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:20:35,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:20:35,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:20:35,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▋ | 723/2230 [2:21:25<4:56:16, 11.80s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▋ | 723/2230 [2:21:25<4:56:16, 11.80s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▋ | 723/2230 [2:21:25<4:56:16, 11.80s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▋ | 723/2230 [2:21:25<4:56:16, 11.80s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▋ | 723/2230 [2:21:25<4:56:16, 11.80s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▋ | 723/2230 [2:21:25<4:56:16, 11.80s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▋ | 723/2230 [2:21:25<4:56:16, 11.80s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.4039, 'learning_rate': 0.0002618497109826589, 'epoch': 1.62} 32%|████████████████████████▋ | 723/2230 [2:21:25<4:56:16, 11.80s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▋ | 723/2230 [2:21:25<4:56:16, 11.80s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▋ | 723/2230 [2:21:25<4:56:16, 11.80s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▋ | 723/2230 [2:21:25<4:56:16, 11.80s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▋ | 723/2230 [2:21:25<4:56:16, 11.80s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 32%|████████████████████████▋ | 723/2230 [2:21:25<4:56:16, 11.80s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:21:06,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:21:06,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:21:06,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:21:06,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:21:06,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:21:06,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:21:06,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.655, 'learning_rate': 0.0002615028901734104, 'epoch': 1.63} [WARNING|modeling_bart.py:1051] 2022-03-23 19:21:06,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:21:06,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:21:06,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 33%|████████████████████████▊ | 727/2230 [2:22:10<4:46:38, 11.44s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 33%|████████████████████████▊ | 727/2230 [2:22:10<4:46:38, 11.44s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.3933, 'learning_rate': 0.0002613294797687861, 'epoch': 1.63} 33%|████████████████████████▊ | 727/2230 [2:22:10<4:46:38, 11.44s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 33%|████████████████████████▊ | 727/2230 [2:22:10<4:46:38, 11.44s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 33%|████████████████████████▊ | 727/2230 [2:22:10<4:46:38, 11.44s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 33%|████████████████████████▊ | 727/2230 [2:22:10<4:46:38, 11.44s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 33%|████████████████████████▊ | 727/2230 [2:22:10<4:46:38, 11.44s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.3983, 'learning_rate': 0.00026115606936416184, 'epoch': 1.63} 33%|████████████████████████▊ | 727/2230 [2:22:10<4:46:38, 11.44s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 33%|████████████████████████▊ | 727/2230 [2:22:10<4:46:38, 11.44s/it] Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:21:45,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:21:45,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:21:45,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:21:45,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.2188, 'learning_rate': 0.0002609826589595376, 'epoch': 1.63} [WARNING|modeling_bart.py:1051] 2022-03-23 19:21:45,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:21:55,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:21:55,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 33%|████████████████████████▉ | 730/2230 [2:22:43<4:36:11, 11.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 33%|████████████████████████▉ | 730/2230 [2:22:43<4:36:11, 11.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.3292, 'learning_rate': 0.0002608092485549133, 'epoch': 1.64} 33%|████████████████████████▉ | 730/2230 [2:22:43<4:36:11, 11.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 33%|████████████████████████▉ | 730/2230 [2:22:43<4:36:11, 11.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 33%|████████████████████████▉ | 730/2230 [2:22:43<4:36:11, 11.05s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:22:09,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:22:09,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.3526, 'learning_rate': 0.000260635838150289, 'epoch': 1.64} [WARNING|modeling_utils.py:388] 2022-03-23 19:22:09,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:22:09,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:22:09,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:22:19,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:22:19,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.3375, 'learning_rate': 0.0002604624277456647, 'epoch': 1.64} [WARNING|modeling_utils.py:388] 2022-03-23 19:22:19,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:22:25,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:22:25,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:22:29,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:22:29,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.1941, 'learning_rate': 0.00026028901734104047, 'epoch': 1.64} [WARNING|modeling_utils.py:388] 2022-03-23 19:22:29,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:22:35,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:22:35,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 33%|█████████████████████████ | 734/2230 [2:23:24<4:16:17, 10.28s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 33%|█████████████████████████ | 734/2230 [2:23:24<4:16:17, 10.28s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:22:41,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:22:41,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:22:46,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:22:46,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:22:46,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:22:46,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:22:52,166 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:22:52,166 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:22:56,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:22:58,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:22:58,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.3232, 'learning_rate': 0.0002597687861271676, 'epoch': 1.65} [WARNING|modeling_bart.py:1051] 2022-03-23 19:23:02,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:23:04,296 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:23:06,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:23:06,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:23:08,518 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:23:10,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:23:10,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:14,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:14,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:16,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:18,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:20,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:20,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:21,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:23,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:25,407 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:28,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:28,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:30,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:31,949 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:34,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:34,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:36,507 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:39,224 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:40,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:40,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:43,083 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:45,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:45,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:46,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:48,691 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:48,691 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:50,718 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:53,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:53,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:55,063 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:55,063 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:56,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:58,754 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:23:58,754 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:02,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:02,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:05,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:09,298 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:09,298 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 6.9996, 'learning_rate': 0.00025786127167630056, 'epoch': 1.67} [WARNING|modeling_utils.py:388] 2022-03-23 19:24:12,890 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:12,890 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:16,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:19,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:19,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:23,144 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:23,144 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 6.0548, 'learning_rate': 0.00025768786127167625, 'epoch': 1.68} [WARNING|modeling_utils.py:388] 2022-03-23 19:24:26,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:26,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:30,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:33,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:33,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:36,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:36,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 4.8795, 'learning_rate': 0.000257514450867052, 'epoch': 1.68} [WARNING|modeling_utils.py:388] 2022-03-23 19:24:40,230 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:43,503 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:43,503 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:46,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:50,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:50,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:50,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:53,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:53,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:53,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:53,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:53,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:53,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:53,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.8835, 'learning_rate': 0.00025716763005780344, 'epoch': 1.68} [WARNING|modeling_utils.py:388] 2022-03-23 19:24:53,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:53,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:53,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:53,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:24:53,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 752/2230 [2:26:03<5:05:30, 12.40s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 752/2230 [2:26:03<5:05:30, 12.40s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.3855, 'learning_rate': 0.00025699421965317914, 'epoch': 1.69} 34%|█████████████████████████▋ | 752/2230 [2:26:03<5:05:30, 12.40s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 752/2230 [2:26:03<5:05:30, 12.40s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 752/2230 [2:26:03<5:05:30, 12.40s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 752/2230 [2:26:03<5:05:30, 12.40s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 752/2230 [2:26:03<5:05:30, 12.40s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 752/2230 [2:26:03<5:05:30, 12.40s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.2957, 'learning_rate': 0.0002568208092485549, 'epoch': 1.69} 34%|█████████████████████████▋ | 752/2230 [2:26:03<5:05:30, 12.40s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 752/2230 [2:26:03<5:05:30, 12.40s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 752/2230 [2:26:03<5:05:30, 12.40s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 752/2230 [2:26:03<5:05:30, 12.40s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 752/2230 [2:26:03<5:05:30, 12.40s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 752/2230 [2:26:03<5:05:30, 12.40s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 752/2230 [2:26:03<5:05:30, 12.40s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.001, 'learning_rate': 0.00025664739884393063, 'epoch': 1.69} 34%|█████████████████████████▋ | 752/2230 [2:26:03<5:05:30, 12.40s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 752/2230 [2:26:03<5:05:30, 12.40s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 752/2230 [2:26:03<5:05:30, 12.40s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 752/2230 [2:26:03<5:05:30, 12.40s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.824, 'learning_rate': 0.0002564739884393063, 'epoch': 1.69} 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.9435, 'learning_rate': 0.00025630057803468207, 'epoch': 1.7} 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.5901, 'learning_rate': 0.0002561271676300578, 'epoch': 1.7} 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.447, 'learning_rate': 0.0002559537572254335, 'epoch': 1.7} 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.4363, 'learning_rate': 0.0002557803468208092, 'epoch': 1.7} 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▋ | 755/2230 [2:26:42<5:15:19, 12.83s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 760/2230 [2:27:45<5:11:03, 12.70s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 760/2230 [2:27:45<5:11:03, 12.70s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.3356, 'learning_rate': 0.00025560693641618496, 'epoch': 1.7} 34%|█████████████████████████▉ | 760/2230 [2:27:45<5:11:03, 12.70s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 760/2230 [2:27:45<5:11:03, 12.70s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 760/2230 [2:27:45<5:11:03, 12.70s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 760/2230 [2:27:45<5:11:03, 12.70s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 761/2230 [2:27:58<5:09:31, 12.64s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 761/2230 [2:27:58<5:09:31, 12.64s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.3838, 'learning_rate': 0.0002554335260115607, 'epoch': 1.71} 34%|█████████████████████████▉ | 761/2230 [2:27:58<5:09:31, 12.64s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 761/2230 [2:27:58<5:09:31, 12.64s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 761/2230 [2:27:58<5:09:31, 12.64s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 761/2230 [2:27:58<5:09:31, 12.64s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.1423, 'learning_rate': 0.0002552601156069364, 'epoch': 1.71} 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.2079, 'learning_rate': 0.0002550867052023121, 'epoch': 1.71} 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.1538, 'learning_rate': 0.00025491329479768784, 'epoch': 1.71} 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.0943, 'learning_rate': 0.0002547398843930636, 'epoch': 1.72} 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.1982, 'learning_rate': 0.0002545664739884393, 'epoch': 1.72} 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.1676, 'learning_rate': 0.00025439306358381503, 'epoch': 1.72} 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|█████████████████████████▉ | 762/2230 [2:28:10<5:07:19, 12.56s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|██████████████████████████▏ | 768/2230 [2:29:24<4:57:56, 12.23s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|██████████████████████████▏ | 768/2230 [2:29:24<4:57:56, 12.23s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.0995, 'learning_rate': 0.0002542196531791907, 'epoch': 1.72} 34%|██████████████████████████▏ | 768/2230 [2:29:24<4:57:56, 12.23s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|██████████████████████████▏ | 768/2230 [2:29:24<4:57:56, 12.23s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 34%|██████████████████████████▏ | 768/2230 [2:29:24<4:57:56, 12.23s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.9632, 'learning_rate': 0.00025404624277456647, 'epoch': 1.72} [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.0703, 'learning_rate': 0.00025387283236994216, 'epoch': 1.73} [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.9207, 'learning_rate': 0.0002536994219653179, 'epoch': 1.73} [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.9262, 'learning_rate': 0.0002535260115606936, 'epoch': 1.73} [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.9645, 'learning_rate': 0.00025335260115606935, 'epoch': 1.73} [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.0928, 'learning_rate': 0.00025317919075144504, 'epoch': 1.74} [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:28:50,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 35%|██████████████████████████▍ | 775/2230 [2:30:46<4:43:17, 11.68s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 35%|██████████████████████████▍ | 775/2230 [2:30:46<4:43:17, 11.68s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.9605, 'learning_rate': 0.0002530057803468208, 'epoch': 1.74} 35%|██████████████████████████▍ | 775/2230 [2:30:46<4:43:17, 11.68s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 35%|██████████████████████████▍ | 775/2230 [2:30:46<4:43:17, 11.68s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 35%|██████████████████████████▍ | 775/2230 [2:30:46<4:43:17, 11.68s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 35%|██████████████████████████▍ | 775/2230 [2:30:46<4:43:17, 11.68s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 35%|██████████████████████████▍ | 775/2230 [2:30:46<4:43:17, 11.68s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.0233, 'learning_rate': 0.0002528323699421965, 'epoch': 1.74} 35%|██████████████████████████▍ | 775/2230 [2:30:46<4:43:17, 11.68s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 35%|██████████████████████████▍ | 775/2230 [2:30:46<4:43:17, 11.68s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 35%|██████████████████████████▍ | 775/2230 [2:30:46<4:43:17, 11.68s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 35%|██████████████████████████▍ | 775/2230 [2:30:46<4:43:17, 11.68s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 35%|██████████████████████████▍ | 777/2230 [2:31:08<4:36:18, 11.41s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 35%|██████████████████████████▍ | 777/2230 [2:31:08<4:36:18, 11.41s/it]g-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.9323, 'learning_rate': 0.00025265895953757223, 'epoch': 1.74} [WARNING|modeling_bart.py:1051] 2022-03-23 19:30:29,171 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:30:29,171 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:30:29,171 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:30:35,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:30:35,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.8583, 'learning_rate': 0.000252485549132948, 'epoch': 1.74} [WARNING|modeling_utils.py:388] 2022-03-23 19:30:35,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:30:35,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:30:43,323 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:30:43,323 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:30:43,323 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:30:47,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:30:47,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:30:47,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:30:47,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:30:47,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:30:47,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:30:47,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.9294, 'learning_rate': 0.00025213872832369937, 'epoch': 1.75} [WARNING|modeling_utils.py:388] 2022-03-23 19:31:01,740 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:31:01,740 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:31:01,740 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:31:01,740 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:31:01,740 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.9475, 'learning_rate': 0.0002519653179190751, 'epoch': 1.75} [WARNING|modeling_utils.py:388] 2022-03-23 19:31:11,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:31:11,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:31:11,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:31:11,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:31:11,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:31:20,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:31:20,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:31:20,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:31:26,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:31:26,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:31:26,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.8889, 'learning_rate': 0.00025161849710982656, 'epoch': 1.76} [WARNING|modeling_bart.py:1051] 2022-03-23 19:31:32,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:31:32,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:31:36,616 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:31:36,616 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.8678, 'learning_rate': 0.0002514450867052023, 'epoch': 1.76} [WARNING|modeling_bart.py:1051] 2022-03-23 19:31:40,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:31:40,930 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:31:44,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:05:01,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 35%|██████████████████████████▊ | 785/2230 [2:32:30<3:58:59, 9.92s/it][WARNING|modeling_bart.py:1051] 2022-03-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 35%|██████████████████████████▊ | 785/2230 [2:32:30<3:58:59, 9.92s/it][WARNING|modeling_bart.py:1051] 2022-03-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.8449, 'learning_rate': 0.00025127167630057805, 'epoch': 1.76} [WARNING|modeling_utils.py:388] 2022-03-23 19:31:50,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:31:50,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:31:54,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:31:54,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:31:56,779 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:31:58,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:00,998 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:03,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:03,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:05,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:07,119 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:09,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:11,313 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:11,313 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:15,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:16,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:16,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:18,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:20,540 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:23,842 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:25,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:25,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:27,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:30,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:31,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:31,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:33,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:35,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:35,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:37,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:39,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:41,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:41,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:44,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:46,052 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:46,052 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:47,993 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:49,718 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:49,718 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:52,146 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:52,146 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:52,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:56,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:56,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:59,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:32:59,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:03,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:06,626 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:06,626 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 6.4464, 'learning_rate': 0.00024919075144508665, 'epoch': 1.79} [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:10,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:10,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:13,610 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:17,056 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:17,056 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:20,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:20,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 5.0751, 'learning_rate': 0.0002490173410404624, 'epoch': 1.79} [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:23,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:27,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:27,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:30,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:30,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:34,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:34,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:37,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:37,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:40,960 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:40,960 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:44,298 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:47,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:47,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:47,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:51,497 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:51,497 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:51,497 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:51,497 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:33:51,497 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 801/2230 [2:34:46<4:49:58, 12.18s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 801/2230 [2:34:46<4:49:58, 12.18s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 2.1743, 'learning_rate': 0.0002484971098265896, 'epoch': 1.8} 36%|███████████████████████████▎ | 801/2230 [2:34:46<4:49:58, 12.18s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 801/2230 [2:34:46<4:49:58, 12.18s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 801/2230 [2:34:46<4:49:58, 12.18s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 801/2230 [2:34:46<4:49:58, 12.18s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 801/2230 [2:34:46<4:49:58, 12.18s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 801/2230 [2:34:46<4:49:58, 12.18s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 801/2230 [2:34:46<4:49:58, 12.18s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.9198, 'learning_rate': 0.00024832369942196533, 'epoch': 1.8} 36%|███████████████████████████▎ | 801/2230 [2:34:46<4:49:58, 12.18s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 801/2230 [2:34:46<4:49:58, 12.18s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 801/2230 [2:34:46<4:49:58, 12.18s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 801/2230 [2:34:46<4:49:58, 12.18s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 801/2230 [2:34:46<4:49:58, 12.18s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 803/2230 [2:35:13<5:02:56, 12.74s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 803/2230 [2:35:13<5:02:56, 12.74s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 803/2230 [2:35:13<5:02:56, 12.74s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 803/2230 [2:35:13<5:02:56, 12.74s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 803/2230 [2:35:13<5:02:56, 12.74s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 803/2230 [2:35:13<5:02:56, 12.74s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 803/2230 [2:35:13<5:02:56, 12.74s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 803/2230 [2:35:13<5:02:56, 12.74s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.6071, 'learning_rate': 0.0002479768786127167, 'epoch': 1.8} 36%|███████████████████████████▎ | 803/2230 [2:35:13<5:02:56, 12.74s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 803/2230 [2:35:13<5:02:56, 12.74s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 803/2230 [2:35:13<5:02:56, 12.74s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 803/2230 [2:35:13<5:02:56, 12.74s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 803/2230 [2:35:13<5:02:56, 12.74s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 803/2230 [2:35:13<5:02:56, 12.74s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.511, 'learning_rate': 0.00024780346820809247, 'epoch': 1.8} 36%|███████████████████████████▎ | 803/2230 [2:35:13<5:02:56, 12.74s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 803/2230 [2:35:13<5:02:56, 12.74s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 803/2230 [2:35:13<5:02:56, 12.74s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 803/2230 [2:35:13<5:02:56, 12.74s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▎ | 803/2230 [2:35:13<5:02:56, 12.74s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.464, 'learning_rate': 0.0002476300578034682, 'epoch': 1.81} 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.1126, 'learning_rate': 0.0002474566473988439, 'epoch': 1.81} 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.0599, 'learning_rate': 0.0002472832369942196, 'epoch': 1.81} 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.0413, 'learning_rate': 0.00024710982658959535, 'epoch': 1.81} 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.0306, 'learning_rate': 0.0002469364161849711, 'epoch': 1.82} 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▍ | 806/2230 [2:35:52<5:06:57, 12.93s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 811/2230 [2:36:55<4:59:12, 12.65s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 811/2230 [2:36:55<4:59:12, 12.65s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.943, 'learning_rate': 0.0002467630057803468, 'epoch': 1.82} 36%|███████████████████████████▋ | 811/2230 [2:36:55<4:59:12, 12.65s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 811/2230 [2:36:55<4:59:12, 12.65s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 811/2230 [2:36:55<4:59:12, 12.65s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 811/2230 [2:36:55<4:59:12, 12.65s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.9814, 'learning_rate': 0.00024658959537572254, 'epoch': 1.82} 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.8165, 'learning_rate': 0.00024641618497109823, 'epoch': 1.82} 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.8491, 'learning_rate': 0.000246242774566474, 'epoch': 1.83} 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.9818, 'learning_rate': 0.0002460693641618497, 'epoch': 1.83} 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.8564, 'learning_rate': 0.0002458959537572254, 'epoch': 1.83} 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.8471, 'learning_rate': 0.0002457225433526011, 'epoch': 1.83} 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.7855, 'learning_rate': 0.00024554913294797686, 'epoch': 1.83} 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 36%|███████████████████████████▋ | 812/2230 [2:37:08<4:58:18, 12.62s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|███████████████████████████▉ | 819/2230 [2:38:33<4:44:43, 12.11s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|███████████████████████████▉ | 819/2230 [2:38:33<4:44:43, 12.11s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|███████████████████████████▉ | 819/2230 [2:38:33<4:44:43, 12.11s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|███████████████████████████▉ | 819/2230 [2:38:33<4:44:43, 12.11s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|███████████████████████████▉ | 819/2230 [2:38:33<4:44:43, 12.11s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|███████████████████████████▉ | 819/2230 [2:38:33<4:44:43, 12.11s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|███████████████████████████▉ | 819/2230 [2:38:33<4:44:43, 12.11s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|███████████████████████████▉ | 819/2230 [2:38:33<4:44:43, 12.11s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.7538, 'learning_rate': 0.0002452023121387283, 'epoch': 1.84} 37%|███████████████████████████▉ | 819/2230 [2:38:33<4:44:43, 12.11s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|███████████████████████████▉ | 819/2230 [2:38:33<4:44:43, 12.11s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|███████████████████████████▉ | 819/2230 [2:38:33<4:44:43, 12.11s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|███████████████████████████▉ | 819/2230 [2:38:33<4:44:43, 12.11s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|███████████████████████████▉ | 819/2230 [2:38:33<4:44:43, 12.11s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.8812, 'learning_rate': 0.000245028901734104, 'epoch': 1.84} 37%|███████████████████████████▉ | 819/2230 [2:38:33<4:44:43, 12.11s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|███████████████████████████▉ | 819/2230 [2:38:33<4:44:43, 12.11s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|███████████████████████████▉ | 819/2230 [2:38:33<4:44:43, 12.11s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|███████████████████████████▉ | 819/2230 [2:38:33<4:44:43, 12.11s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|███████████████████████████▉ | 819/2230 [2:38:33<4:44:43, 12.11s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|███████████████████████████▉ | 819/2230 [2:38:33<4:44:43, 12.11s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.8958, 'learning_rate': 0.00024485549132947975, 'epoch': 1.84} 37%|███████████████████████████▉ | 819/2230 [2:38:33<4:44:43, 12.11s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|███████████████████████████▉ | 819/2230 [2:38:33<4:44:43, 12.11s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|███████████████████████████▉ | 819/2230 [2:38:33<4:44:43, 12.11s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|███████████████████████████▉ | 819/2230 [2:38:33<4:44:43, 12.11s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████ | 823/2230 [2:39:20<4:36:57, 11.81s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████ | 823/2230 [2:39:20<4:36:57, 11.81s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.7152, 'learning_rate': 0.0002446820809248555, 'epoch': 1.85} 37%|████████████████████████████ | 823/2230 [2:39:20<4:36:57, 11.81s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████ | 823/2230 [2:39:20<4:36:57, 11.81s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████ | 823/2230 [2:39:20<4:36:57, 11.81s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████ | 823/2230 [2:39:20<4:36:57, 11.81s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████ | 823/2230 [2:39:20<4:36:57, 11.81s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.7971, 'learning_rate': 0.0002445086705202312, 'epoch': 1.85} 37%|████████████████████████████ | 823/2230 [2:39:20<4:36:57, 11.81s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████ | 823/2230 [2:39:20<4:36:57, 11.81s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████ | 823/2230 [2:39:20<4:36:57, 11.81s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████ | 823/2230 [2:39:20<4:36:57, 11.81s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████ | 823/2230 [2:39:20<4:36:57, 11.81s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████ | 823/2230 [2:39:20<4:36:57, 11.81s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.7203, 'learning_rate': 0.0002443352601156069, 'epoch': 1.85} 37%|████████████████████████████ | 823/2230 [2:39:20<4:36:57, 11.81s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████ | 823/2230 [2:39:20<4:36:57, 11.81s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████ | 823/2230 [2:39:20<4:36:57, 11.81s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████ | 823/2230 [2:39:20<4:36:57, 11.81s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████ | 823/2230 [2:39:20<4:36:57, 11.81s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████ | 823/2230 [2:39:20<4:36:57, 11.81s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.7015, 'learning_rate': 0.00024416184971098263, 'epoch': 1.85} 37%|████████████████████████████ | 823/2230 [2:39:20<4:36:57, 11.81s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████ | 823/2230 [2:39:20<4:36:57, 11.81s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████ | 823/2230 [2:39:20<4:36:57, 11.81s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████▏ | 827/2230 [2:40:06<4:28:01, 11.46s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████▏ | 827/2230 [2:40:06<4:28:01, 11.46s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.9212, 'learning_rate': 0.00024398843930635838, 'epoch': 1.85} 37%|████████████████████████████▏ | 827/2230 [2:40:06<4:28:01, 11.46s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████▏ | 827/2230 [2:40:06<4:28:01, 11.46s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████▏ | 827/2230 [2:40:06<4:28:01, 11.46s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████▏ | 827/2230 [2:40:06<4:28:01, 11.46s/it] Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:39:34,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:39:34,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.6807, 'learning_rate': 0.00024381502890173407, 'epoch': 1.86} [WARNING|modeling_utils.py:388] 2022-03-23 19:39:38,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:39:38,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:39:42,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:39:42,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:39:42,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:39:46,712 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:39:46,712 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:39:50,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:39:50,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████▎ | 830/2230 [2:40:39<4:17:32, 11.04s/it]g-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████▎ | 830/2230 [2:40:39<4:17:32, 11.04s/it]g-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.7264, 'learning_rate': 0.00024346820809248554, 'epoch': 1.86} 37%|████████████████████████████▎ | 830/2230 [2:40:39<4:17:32, 11.04s/it]g-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:40:00,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:40:00,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:40:00,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:40:00,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:40:00,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:40:08,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:40:08,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:40:08,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:40:15,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:40:15,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.757, 'learning_rate': 0.00024312138728323698, 'epoch': 1.87} [WARNING|modeling_utils.py:388] 2022-03-23 19:40:15,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:40:15,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:40:15,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:40:25,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:40:25,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.687, 'learning_rate': 0.00024294797687861267, 'epoch': 1.87} [WARNING|modeling_utils.py:388] 2022-03-23 19:40:25,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:40:31,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:40:31,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:40:31,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:31:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████▍ | 834/2230 [2:41:19<3:59:47, 10.31s/it][WARNING|modeling_bart.py:1051] 2022-03-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 37%|████████████████████████████▍ | 834/2230 [2:41:19<3:59:47, 10.31s/it][WARNING|modeling_bart.py:1051] 2022-03-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:40:39,582 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:40:39,582 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:40:43,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:40:43,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:40:45,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:40:45,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:40:49,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:40:49,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:40:53,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:40:53,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:40:56,046 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:40:58,176 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:40:58,176 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:01,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:01,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:04,067 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:06,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:08,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:10,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:10,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:12,293 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:14,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:15,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:17,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:17,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:19,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:21,233 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:24,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:24,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:26,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:27,792 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:30,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:30,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:32,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:35,046 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:36,340 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:36,340 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:38,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:40,042 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:40,042 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:42,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:44,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:44,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:46,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:49,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:49,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:50,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:52,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:52,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:55,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:55,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:58,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:41:58,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:02,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:05,815 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:05,815 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:05,815 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:09,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:09,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:12,768 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:12,768 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:16,178 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:19,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:19,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 3.3921, 'learning_rate': 0.00024034682080924854, 'epoch': 1.9} [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:23,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:23,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:26,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:29,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:29,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:33,291 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:33,291 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.9051, 'learning_rate': 0.00024017341040462423, 'epoch': 1.9} [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:36,718 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:40,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:40,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:43,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:43,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.4066, 'learning_rate': 0.00023999999999999998, 'epoch': 1.91} [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.3678, 'learning_rate': 0.0002398265895953757, 'epoch': 1.91} [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.3079, 'learning_rate': 0.00023965317919075142, 'epoch': 1.91} [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.0537, 'learning_rate': 0.00023947976878612714, 'epoch': 1.91} [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.0467, 'learning_rate': 0.0002393063583815029, 'epoch': 1.91} [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.9966, 'learning_rate': 0.0002391329479768786, 'epoch': 1.92} [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.8728, 'learning_rate': 0.0002389595375722543, 'epoch': 1.92} [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.8287, 'learning_rate': 0.00023878612716763002, 'epoch': 1.92} [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:42:46,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 38%|█████████████████████████████▏ | 858/2230 [2:45:17<4:52:54, 12.81s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 38%|█████████████████████████████▏ | 858/2230 [2:45:17<4:52:54, 12.81s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.891, 'learning_rate': 0.00023861271676300577, 'epoch': 1.92} 38%|█████████████████████████████▏ | 858/2230 [2:45:17<4:52:54, 12.81s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 38%|█████████████████████████████▏ | 858/2230 [2:45:17<4:52:54, 12.81s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 38%|█████████████████████████████▏ | 858/2230 [2:45:17<4:52:54, 12.81s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 38%|█████████████████████████████▏ | 858/2230 [2:45:17<4:52:54, 12.81s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▎ | 859/2230 [2:45:29<4:50:58, 12.73s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▎ | 859/2230 [2:45:29<4:50:58, 12.73s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.6969, 'learning_rate': 0.0002384393063583815, 'epoch': 1.93} 39%|█████████████████████████████▎ | 859/2230 [2:45:29<4:50:58, 12.73s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▎ | 859/2230 [2:45:29<4:50:58, 12.73s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▎ | 859/2230 [2:45:29<4:50:58, 12.73s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▎ | 859/2230 [2:45:29<4:50:58, 12.73s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▎ | 860/2230 [2:45:42<4:48:56, 12.65s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▎ | 860/2230 [2:45:42<4:48:56, 12.65s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.7537, 'learning_rate': 0.0002382658959537572, 'epoch': 1.93} 39%|█████████████████████████████▎ | 860/2230 [2:45:42<4:48:56, 12.65s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▎ | 860/2230 [2:45:42<4:48:56, 12.65s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▎ | 860/2230 [2:45:42<4:48:56, 12.65s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▎ | 860/2230 [2:45:42<4:48:56, 12.65s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▎ | 861/2230 [2:45:54<4:47:05, 12.58s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▎ | 861/2230 [2:45:54<4:47:05, 12.58s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.7415, 'learning_rate': 0.0002380924855491329, 'epoch': 1.93} 39%|█████████████████████████████▎ | 861/2230 [2:45:54<4:47:05, 12.58s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▎ | 861/2230 [2:45:54<4:47:05, 12.58s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▎ | 861/2230 [2:45:54<4:47:05, 12.58s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▎ | 861/2230 [2:45:54<4:47:05, 12.58s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▍ | 862/2230 [2:46:06<4:45:15, 12.51s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▍ | 862/2230 [2:46:06<4:45:15, 12.51s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.6404, 'learning_rate': 0.00023791907514450865, 'epoch': 1.93} 39%|█████████████████████████████▍ | 862/2230 [2:46:06<4:45:15, 12.51s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▍ | 862/2230 [2:46:06<4:45:15, 12.51s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▍ | 862/2230 [2:46:06<4:45:15, 12.51s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▍ | 862/2230 [2:46:06<4:45:15, 12.51s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▍ | 862/2230 [2:46:06<4:45:15, 12.51s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▍ | 862/2230 [2:46:06<4:45:15, 12.51s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▍ | 862/2230 [2:46:06<4:45:15, 12.51s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.6738, 'learning_rate': 0.00023774566473988437, 'epoch': 1.93} 39%|█████████████████████████████▍ | 862/2230 [2:46:06<4:45:15, 12.51s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▍ | 862/2230 [2:46:06<4:45:15, 12.51s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▍ | 862/2230 [2:46:06<4:45:15, 12.51s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▍ | 862/2230 [2:46:06<4:45:15, 12.51s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▍ | 862/2230 [2:46:06<4:45:15, 12.51s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▍ | 862/2230 [2:46:06<4:45:15, 12.51s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.7438, 'learning_rate': 0.0002375722543352601, 'epoch': 1.94} [WARNING|modeling_utils.py:388] 2022-03-23 19:45:53,361 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:45:53,361 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:45:53,361 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:45:59,399 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:45:59,399 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5957, 'learning_rate': 0.00023739884393063582, 'epoch': 1.94} [WARNING|modeling_utils.py:388] 2022-03-23 19:45:59,399 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:45:59,399 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:46:07,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:46:07,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5418, 'learning_rate': 0.00023722543352601156, 'epoch': 1.94} Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.6583, 'learning_rate': 0.00023705202312138726, 'epoch': 1.94} Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.7395, 'learning_rate': 0.00023687861271676298, 'epoch': 1.95} [WARNING|modeling_utils.py:388] 2022-03-23 19:46:38,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:46:38,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:46:38,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:46:38,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:46:38,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:46:38,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.6711, 'learning_rate': 0.0002367052023121387, 'epoch': 1.95} [WARNING|modeling_utils.py:388] 2022-03-23 19:46:38,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:46:38,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:46:38,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:46:38,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▋ | 870/2230 [2:47:43<4:28:58, 11.87s/it]g-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▋ | 870/2230 [2:47:43<4:28:58, 11.87s/it]g-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5824, 'learning_rate': 0.00023653179190751445, 'epoch': 1.95} 39%|█████████████████████████████▋ | 870/2230 [2:47:43<4:28:58, 11.87s/it]g-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▋ | 870/2230 [2:47:43<4:28:58, 11.87s/it]g-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▋ | 870/2230 [2:47:43<4:28:58, 11.87s/it]g-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▋ | 870/2230 [2:47:43<4:28:58, 11.87s/it]g-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:47:11,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:47:11,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.6412, 'learning_rate': 0.00023635838150289017, 'epoch': 1.95} [WARNING|modeling_utils.py:388] 2022-03-23 19:47:11,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:47:11,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:47:11,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:47:21,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:47:21,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5811, 'learning_rate': 0.00023618497109826586, 'epoch': 1.96} [WARNING|modeling_utils.py:388] 2022-03-23 19:47:21,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:47:21,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:47:21,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:47:21,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:47:21,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:47:21,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.611, 'learning_rate': 0.00023601156069364158, 'epoch': 1.96} [WARNING|modeling_utils.py:388] 2022-03-23 19:47:21,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:47:21,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:47:21,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▊ | 874/2230 [2:48:28<4:18:13, 11.43s/it]g-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▊ | 874/2230 [2:48:28<4:18:13, 11.43s/it]g-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.6409, 'learning_rate': 0.00023583815028901733, 'epoch': 1.96} [WARNING|modeling_utils.py:388] 2022-03-23 19:47:48,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:47:48,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:47:48,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:47:48,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:47:48,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.6914, 'learning_rate': 0.00023566473988439305, 'epoch': 1.96} [WARNING|modeling_utils.py:388] 2022-03-23 19:47:48,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:47:48,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:47:48,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:47:48,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:47:48,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▊ | 876/2230 [2:48:50<4:13:41, 11.24s/it]g-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▊ | 876/2230 [2:48:50<4:13:41, 11.24s/it]g-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 39%|█████████████████████████████▊ | 876/2230 [2:48:50<4:13:41, 11.24s/it]g-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:48:12,325 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:48:12,325 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:48:12,325 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:48:12,325 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5725, 'learning_rate': 0.0002353179190751445, 'epoch': 1.97} [WARNING|modeling_utils.py:388] 2022-03-23 19:48:20,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:48:20,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:48:20,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:48:20,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:48:20,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5752, 'learning_rate': 0.00023514450867052024, 'epoch': 1.97} [WARNING|modeling_utils.py:388] 2022-03-23 19:48:20,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:48:33,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:48:33,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:48:33,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:48:33,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5661, 'learning_rate': 0.00023497109826589593, 'epoch': 1.97} [WARNING|modeling_bart.py:1051] 2022-03-23 19:48:33,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:48:33,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:48:44,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:48:44,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:48:44,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.51, 'learning_rate': 0.00023479768786127165, 'epoch': 1.97} [WARNING|modeling_utils.py:388] 2022-03-23 19:48:51,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:48:51,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:48:55,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 40%|██████████████████████████████ | 881/2230 [2:49:41<3:51:00, 10.27s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 40%|██████████████████████████████ | 881/2230 [2:49:41<3:51:00, 10.27s/it] Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5564, 'learning_rate': 0.00023462427745664737, 'epoch': 1.98} [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:01,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:01,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:01,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:01,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:40:36,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 40%|██████████████████████████████ | 882/2230 [2:49:51<3:45:28, 10.04s/it][WARNING|modeling_bart.py:1051] 2022-03-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:09,738 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:11,941 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:11,941 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:49:15,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:49:15,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5626, 'learning_rate': 0.00023427745664739884, 'epoch': 1.98} [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:19,504 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:21,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:23,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:23,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:25,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:27,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:29,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:31,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:31,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:33,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:35,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:36,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:36,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:38,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:41,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:43,395 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:44,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:44,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:47,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:49,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:50,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:50,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:53,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:55,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:55,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:49:58,225 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:00,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:00,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:02,201 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:03,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:03,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:05,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:05,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:07,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:10,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:10,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:14,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:14,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:17,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:17,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:17,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:21,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:24,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:24,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:28,030 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:28,030 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:31,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:34,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:34,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.8651, 'learning_rate': 0.00023236994219653174, 'epoch': 2.0} [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:38,398 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:38,398 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:41,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:45,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:45,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:48,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:48,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.2221, 'learning_rate': 0.0002321965317919075, 'epoch': 2.01} [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:51,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:55,227 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:55,227 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.9329, 'learning_rate': 0.0002320231213872832, 'epoch': 2.01} [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.8674, 'learning_rate': 0.00023184971098265893, 'epoch': 2.01} [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.8294, 'learning_rate': 0.00023167630057803465, 'epoch': 2.01} [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.807, 'learning_rate': 0.0002315028901734104, 'epoch': 2.02} [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.6646, 'learning_rate': 0.00023132947976878612, 'epoch': 2.02} [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5948, 'learning_rate': 0.00023115606936416181, 'epoch': 2.02} [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.7274, 'learning_rate': 0.00023098265895953754, 'epoch': 2.02} [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.6304, 'learning_rate': 0.00023080924855491328, 'epoch': 2.02} [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.606, 'learning_rate': 0.000230635838150289, 'epoch': 2.03} [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:50:58,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|██████████████████████████████▊ | 905/2230 [2:53:44<4:42:58, 12.81s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|██████████████████████████████▊ | 905/2230 [2:53:44<4:42:58, 12.81s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5263, 'learning_rate': 0.00023046242774566472, 'epoch': 2.03} 41%|██████████████████████████████▊ | 905/2230 [2:53:44<4:42:58, 12.81s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|██████████████████████████████▊ | 905/2230 [2:53:44<4:42:58, 12.81s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|██████████████████████████████▊ | 905/2230 [2:53:44<4:42:58, 12.81s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|██████████████████████████████▊ | 905/2230 [2:53:44<4:42:58, 12.81s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|██████████████████████████████▉ | 906/2230 [2:53:57<4:41:13, 12.74s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|██████████████████████████████▉ | 906/2230 [2:53:57<4:41:13, 12.74s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4577, 'learning_rate': 0.00023028901734104042, 'epoch': 2.03} 41%|██████████████████████████████▉ | 906/2230 [2:53:57<4:41:13, 12.74s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|██████████████████████████████▉ | 906/2230 [2:53:57<4:41:13, 12.74s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|██████████████████████████████▉ | 906/2230 [2:53:57<4:41:13, 12.74s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|██████████████████████████████▉ | 906/2230 [2:53:57<4:41:13, 12.74s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|██████████████████████████████▉ | 906/2230 [2:53:57<4:41:13, 12.74s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|██████████████████████████████▉ | 906/2230 [2:53:57<4:41:13, 12.74s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4746, 'learning_rate': 0.00023011560693641617, 'epoch': 2.03} 41%|██████████████████████████████▉ | 906/2230 [2:53:57<4:41:13, 12.74s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|██████████████████████████████▉ | 906/2230 [2:53:57<4:41:13, 12.74s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5186, 'learning_rate': 0.0002299421965317919, 'epoch': 2.04} [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4732, 'learning_rate': 0.0002297687861271676, 'epoch': 2.04} [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5486, 'learning_rate': 0.00022959537572254333, 'epoch': 2.04} [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5361, 'learning_rate': 0.00022942196531791908, 'epoch': 2.04} [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3955, 'learning_rate': 0.00022924855491329477, 'epoch': 2.04} [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4232, 'learning_rate': 0.0002290751445086705, 'epoch': 2.05} [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4127, 'learning_rate': 0.0002289017341040462, 'epoch': 2.05} [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:53:33,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▏ | 915/2230 [2:55:48<4:27:49, 12.22s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▏ | 915/2230 [2:55:48<4:27:49, 12.22s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▏ | 915/2230 [2:55:48<4:27:49, 12.22s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5143, 'learning_rate': 0.00022872832369942196, 'epoch': 2.05} 41%|███████████████████████████████▏ | 915/2230 [2:55:48<4:27:49, 12.22s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▏ | 915/2230 [2:55:48<4:27:49, 12.22s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▏ | 915/2230 [2:55:48<4:27:49, 12.22s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▏ | 915/2230 [2:55:48<4:27:49, 12.22s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▏ | 916/2230 [2:55:59<4:25:21, 12.12s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▏ | 916/2230 [2:55:59<4:25:21, 12.12s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▏ | 916/2230 [2:55:59<4:25:21, 12.12s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▏ | 916/2230 [2:55:59<4:25:21, 12.12s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:55:24,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:55:24,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:55:24,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4304, 'learning_rate': 0.00022838150289017337, 'epoch': 2.06} [WARNING|modeling_utils.py:388] 2022-03-23 19:55:24,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:55:24,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:55:24,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:55:24,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:55:24,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:55:24,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4103, 'learning_rate': 0.0002282080924855491, 'epoch': 2.06} [WARNING|modeling_utils.py:388] 2022-03-23 19:55:24,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:55:24,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:55:24,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:55:24,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▎ | 919/2230 [2:56:34<4:16:58, 11.76s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▎ | 919/2230 [2:56:34<4:16:58, 11.76s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5457, 'learning_rate': 0.00022803468208092484, 'epoch': 2.06} 41%|███████████████████████████████▎ | 919/2230 [2:56:34<4:16:58, 11.76s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▎ | 919/2230 [2:56:34<4:16:58, 11.76s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▎ | 919/2230 [2:56:34<4:16:58, 11.76s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▎ | 919/2230 [2:56:34<4:16:58, 11.76s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▎ | 919/2230 [2:56:34<4:16:58, 11.76s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3894, 'learning_rate': 0.00022786127167630056, 'epoch': 2.06} 41%|███████████████████████████████▎ | 919/2230 [2:56:34<4:16:58, 11.76s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▎ | 919/2230 [2:56:34<4:16:58, 11.76s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▎ | 919/2230 [2:56:34<4:16:58, 11.76s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▎ | 919/2230 [2:56:34<4:16:58, 11.76s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▍ | 921/2230 [2:56:57<4:11:28, 11.53s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▍ | 921/2230 [2:56:57<4:11:28, 11.53s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4171, 'learning_rate': 0.00022768786127167628, 'epoch': 2.07} 41%|███████████████████████████████▍ | 921/2230 [2:56:57<4:11:28, 11.53s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▍ | 921/2230 [2:56:57<4:11:28, 11.53s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▍ | 921/2230 [2:56:57<4:11:28, 11.53s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▍ | 921/2230 [2:56:57<4:11:28, 11.53s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▍ | 921/2230 [2:56:57<4:11:28, 11.53s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.453, 'learning_rate': 0.00022751445086705198, 'epoch': 2.07} 41%|███████████████████████████████▍ | 921/2230 [2:56:57<4:11:28, 11.53s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▍ | 921/2230 [2:56:57<4:11:28, 11.53s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▍ | 921/2230 [2:56:57<4:11:28, 11.53s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▍ | 921/2230 [2:56:57<4:11:28, 11.53s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▍ | 923/2230 [2:57:19<4:06:32, 11.32s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▍ | 923/2230 [2:57:19<4:06:32, 11.32s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4221, 'learning_rate': 0.00022734104046242772, 'epoch': 2.07} 41%|███████████████████████████████▍ | 923/2230 [2:57:19<4:06:32, 11.32s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▍ | 923/2230 [2:57:19<4:06:32, 11.32s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 41%|███████████████████████████████▍ | 923/2230 [2:57:19<4:06:32, 11.32s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:56:45,870 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:56:45,870 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3982, 'learning_rate': 0.00022716763005780344, 'epoch': 2.07} [WARNING|modeling_utils.py:388] 2022-03-23 19:56:49,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:56:49,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:56:54,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:56:54,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:56:54,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:56:54,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4735, 'learning_rate': 0.00022699421965317917, 'epoch': 2.07} [WARNING|modeling_utils.py:388] 2022-03-23 19:56:54,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:57:04,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:57:04,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:57:04,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:57:04,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4474, 'learning_rate': 0.00022682080924855489, 'epoch': 2.08} [WARNING|modeling_bart.py:1051] 2022-03-23 19:57:04,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:57:04,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:57:16,527 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:57:16,527 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:57:16,527 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4077, 'learning_rate': 0.00022664739884393063, 'epoch': 2.08} [WARNING|modeling_utils.py:388] 2022-03-23 19:57:22,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:57:22,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:57:22,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 42%|███████████████████████████████▋ | 928/2230 [2:58:12<3:49:05, 10.56s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 42%|███████████████████████████████▋ | 928/2230 [2:58:12<3:49:05, 10.56s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:57:30,453 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:57:30,453 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:57:34,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:57:34,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:57:34,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:57:34,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:57:40,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:57:40,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:57:44,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:57:44,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:57:44,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:57:49,293 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:57:49,293 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:57:53,148 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:57:55,425 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:57:55,425 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4089, 'learning_rate': 0.00022595375722543352, 'epoch': 2.09} [WARNING|modeling_bart.py:1051] 2022-03-23 19:57:59,611 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:57:59,611 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:58:03,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:58:03,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 19:58:03,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:07,388 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:09,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:11,561 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:13,622 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:13,622 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:15,734 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:17,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:19,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:21,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:21,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:23,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:25,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:27,201 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:28,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:28,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:30,828 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:34,243 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:35,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:35,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:37,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:40,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:42,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:42,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:43,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:44,960 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:47,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:47,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:49,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:51,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:51,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:54,258 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:56,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:56,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:58:58,442 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:00,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:00,263 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:02,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:04,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:04,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2657, 'learning_rate': 0.00022404624277456644, 'epoch': 2.11} [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:07,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:07,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:11,320 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:14,824 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:14,824 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:18,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:18,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.5798, 'learning_rate': 0.0002238728323699422, 'epoch': 2.11} [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:21,857 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:25,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:25,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:28,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:28,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:32,077 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:32,077 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.1552, 'learning_rate': 0.0002236994219653179, 'epoch': 2.12} [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:35,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:38,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:38,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:42,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:45,707 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:45,707 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:45,707 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:49,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:49,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:52,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:55,789 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:55,789 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:55,789 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.6321, 'learning_rate': 0.00022317919075144507, 'epoch': 2.12} [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.656, 'learning_rate': 0.0002230057803468208, 'epoch': 2.13} [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.613, 'learning_rate': 0.00022283236994219652, 'epoch': 2.13} [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4863, 'learning_rate': 0.00022265895953757224, 'epoch': 2.13} [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.6139, 'learning_rate': 0.00022248554913294798, 'epoch': 2.13} [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5303, 'learning_rate': 0.00022231213872832368, 'epoch': 2.13} [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4703, 'learning_rate': 0.0002221387283236994, 'epoch': 2.14} [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5082, 'learning_rate': 0.00022196531791907512, 'epoch': 2.14} [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 19:59:59,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▌ | 955/2230 [3:02:41<4:31:37, 12.78s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▌ | 955/2230 [3:02:41<4:31:37, 12.78s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3705, 'learning_rate': 0.00022179190751445087, 'epoch': 2.14} 43%|████████████████████████████████▌ | 955/2230 [3:02:41<4:31:37, 12.78s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▌ | 955/2230 [3:02:41<4:31:37, 12.78s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▌ | 955/2230 [3:02:41<4:31:37, 12.78s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▌ | 955/2230 [3:02:41<4:31:37, 12.78s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▌ | 955/2230 [3:02:41<4:31:37, 12.78s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▌ | 956/2230 [3:02:53<4:29:54, 12.71s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▌ | 956/2230 [3:02:53<4:29:54, 12.71s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▌ | 956/2230 [3:02:53<4:29:54, 12.71s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▌ | 956/2230 [3:02:53<4:29:54, 12.71s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▌ | 956/2230 [3:02:53<4:29:54, 12.71s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▌ | 956/2230 [3:02:53<4:29:54, 12.71s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▌ | 956/2230 [3:02:53<4:29:54, 12.71s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▌ | 957/2230 [3:03:06<4:28:11, 12.64s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▌ | 957/2230 [3:03:06<4:28:11, 12.64s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▌ | 957/2230 [3:03:06<4:28:11, 12.64s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▌ | 957/2230 [3:03:06<4:28:11, 12.64s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▌ | 957/2230 [3:03:06<4:28:11, 12.64s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▌ | 957/2230 [3:03:06<4:28:11, 12.64s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3629, 'learning_rate': 0.000221271676300578, 'epoch': 2.15} Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▋ | 959/2230 [3:03:31<4:24:44, 12.50s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▋ | 959/2230 [3:03:31<4:24:44, 12.50s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3247, 'learning_rate': 0.00022109826589595375, 'epoch': 2.15} 43%|████████████████████████████████▋ | 959/2230 [3:03:31<4:24:44, 12.50s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▋ | 959/2230 [3:03:31<4:24:44, 12.50s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▋ | 959/2230 [3:03:31<4:24:44, 12.50s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▋ | 959/2230 [3:03:31<4:24:44, 12.50s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▋ | 960/2230 [3:03:43<4:22:54, 12.42s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▋ | 960/2230 [3:03:43<4:22:54, 12.42s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3734, 'learning_rate': 0.00022092485549132947, 'epoch': 2.15} 43%|████████████████████████████████▋ | 960/2230 [3:03:43<4:22:54, 12.42s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▋ | 960/2230 [3:03:43<4:22:54, 12.42s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▋ | 960/2230 [3:03:43<4:22:54, 12.42s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▋ | 960/2230 [3:03:43<4:22:54, 12.42s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▋ | 960/2230 [3:03:43<4:22:54, 12.42s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▋ | 960/2230 [3:03:43<4:22:54, 12.42s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4231, 'learning_rate': 0.0002207514450867052, 'epoch': 2.15} 43%|████████████████████████████████▋ | 960/2230 [3:03:43<4:22:54, 12.42s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▋ | 960/2230 [3:03:43<4:22:54, 12.42s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▋ | 960/2230 [3:03:43<4:22:54, 12.42s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▋ | 960/2230 [3:03:43<4:22:54, 12.42s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▊ | 962/2230 [3:04:07<4:20:18, 12.32s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▊ | 962/2230 [3:04:07<4:20:18, 12.32s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.369, 'learning_rate': 0.00022057803468208088, 'epoch': 2.16} 43%|████████████████████████████████▊ | 962/2230 [3:04:07<4:20:18, 12.32s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▊ | 962/2230 [3:04:07<4:20:18, 12.32s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▊ | 962/2230 [3:04:07<4:20:18, 12.32s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▊ | 962/2230 [3:04:07<4:20:18, 12.32s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▊ | 963/2230 [3:04:20<4:21:25, 12.38s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▊ | 963/2230 [3:04:20<4:21:25, 12.38s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3606, 'learning_rate': 0.00022040462427745663, 'epoch': 2.16} 43%|████████████████████████████████▊ | 963/2230 [3:04:20<4:21:25, 12.38s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▊ | 963/2230 [3:04:20<4:21:25, 12.38s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▊ | 963/2230 [3:04:20<4:21:25, 12.38s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▊ | 963/2230 [3:04:20<4:21:25, 12.38s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▊ | 964/2230 [3:04:32<4:19:13, 12.29s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▊ | 964/2230 [3:04:32<4:19:13, 12.29s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4322, 'learning_rate': 0.00022023121387283235, 'epoch': 2.16} 43%|████████████████████████████████▊ | 964/2230 [3:04:32<4:19:13, 12.29s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▊ | 964/2230 [3:04:32<4:19:13, 12.29s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▊ | 964/2230 [3:04:32<4:19:13, 12.29s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▊ | 964/2230 [3:04:32<4:19:13, 12.29s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▊ | 964/2230 [3:04:32<4:19:13, 12.29s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▊ | 964/2230 [3:04:32<4:19:13, 12.29s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3928, 'learning_rate': 0.00022005780346820807, 'epoch': 2.16} 43%|████████████████████████████████▊ | 964/2230 [3:04:32<4:19:13, 12.29s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▊ | 964/2230 [3:04:32<4:19:13, 12.29s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▊ | 964/2230 [3:04:32<4:19:13, 12.29s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:04:11,523 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:04:11,523 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4072, 'learning_rate': 0.0002198843930635838, 'epoch': 2.17} [WARNING|modeling_utils.py:388] 2022-03-23 20:04:11,523 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:04:11,523 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:04:19,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:04:19,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▉ | 967/2230 [3:05:07<4:11:30, 11.95s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▉ | 967/2230 [3:05:07<4:11:30, 11.95s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3673, 'learning_rate': 0.00021971098265895954, 'epoch': 2.17} 43%|████████████████████████████████▉ | 967/2230 [3:05:07<4:11:30, 11.95s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|████████████████████████████████▉ | 967/2230 [3:05:07<4:11:30, 11.95s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:04:32,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:04:32,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:04:32,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3822, 'learning_rate': 0.00021953757225433524, 'epoch': 2.17} [WARNING|modeling_utils.py:388] 2022-03-23 20:04:32,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:04:32,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:04:32,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:04:44,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:04:44,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:04:44,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3869, 'learning_rate': 0.00021936416184971096, 'epoch': 2.17} [WARNING|modeling_bart.py:1051] 2022-03-23 20:04:44,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:04:44,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:04:44,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:04:44,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|█████████████████████████████████ | 970/2230 [3:05:42<4:05:07, 11.67s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|█████████████████████████████████ | 970/2230 [3:05:42<4:05:07, 11.67s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3596, 'learning_rate': 0.00021919075144508668, 'epoch': 2.17} 43%|█████████████████████████████████ | 970/2230 [3:05:42<4:05:07, 11.67s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|█████████████████████████████████ | 970/2230 [3:05:42<4:05:07, 11.67s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|█████████████████████████████████ | 970/2230 [3:05:42<4:05:07, 11.67s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|█████████████████████████████████ | 970/2230 [3:05:42<4:05:07, 11.67s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|█████████████████████████████████ | 970/2230 [3:05:42<4:05:07, 11.67s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4073, 'learning_rate': 0.00021901734104046243, 'epoch': 2.18} 43%|█████████████████████████████████ | 970/2230 [3:05:42<4:05:07, 11.67s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|█████████████████████████████████ | 970/2230 [3:05:42<4:05:07, 11.67s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|█████████████████████████████████ | 970/2230 [3:05:42<4:05:07, 11.67s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|█████████████████████████████████ | 970/2230 [3:05:42<4:05:07, 11.67s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|█████████████████████████████████ | 970/2230 [3:05:42<4:05:07, 11.67s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|█████████████████████████████████ | 970/2230 [3:05:42<4:05:07, 11.67s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3531, 'learning_rate': 0.00021884393063583815, 'epoch': 2.18} 43%|█████████████████████████████████ | 970/2230 [3:05:42<4:05:07, 11.67s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|█████████████████████████████████ | 970/2230 [3:05:42<4:05:07, 11.67s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 43%|█████████████████████████████████ | 970/2230 [3:05:42<4:05:07, 11.67s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 44%|█████████████████████████████████▏ | 973/2230 [3:06:15<3:56:09, 11.27s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 44%|█████████████████████████████████▏ | 973/2230 [3:06:15<3:56:09, 11.27s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4388, 'learning_rate': 0.00021867052023121384, 'epoch': 2.18} 44%|█████████████████████████████████▏ | 973/2230 [3:06:15<3:56:09, 11.27s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 44%|█████████████████████████████████▏ | 973/2230 [3:06:15<3:56:09, 11.27s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 44%|█████████████████████████████████▏ | 973/2230 [3:06:15<3:56:09, 11.27s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 44%|█████████████████████████████████▏ | 973/2230 [3:06:15<3:56:09, 11.27s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 44%|█████████████████████████████████▏ | 973/2230 [3:06:15<3:56:09, 11.27s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 44%|█████████████████████████████████▏ | 973/2230 [3:06:15<3:56:09, 11.27s/it] Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3399, 'learning_rate': 0.00021849710982658956, 'epoch': 2.18} [WARNING|modeling_utils.py:388] 2022-03-23 20:05:47,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:05:47,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:05:47,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:05:47,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:05:47,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.286, 'learning_rate': 0.0002183236994219653, 'epoch': 2.19} [WARNING|modeling_utils.py:388] 2022-03-23 20:05:47,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:05:59,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:05:59,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 44%|█████████████████████████████████▎ | 976/2230 [3:06:48<3:48:02, 10.91s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 44%|█████████████████████████████████▎ | 976/2230 [3:06:48<3:48:02, 10.91s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3678, 'learning_rate': 0.00021815028901734103, 'epoch': 2.19} 44%|█████████████████████████████████▎ | 976/2230 [3:06:48<3:48:02, 10.91s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:06:09,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:06:09,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 44%|█████████████████████████████████▎ | 977/2230 [3:06:58<3:43:41, 10.71s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 44%|█████████████████████████████████▎ | 977/2230 [3:06:58<3:43:41, 10.71s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.402, 'learning_rate': 0.00021797687861271675, 'epoch': 2.19} 44%|█████████████████████████████████▎ | 977/2230 [3:06:58<3:43:41, 10.71s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:06:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:06:20,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 44%|█████████████████████████████████▎ | 978/2230 [3:07:08<3:39:35, 10.52s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 44%|█████████████████████████████████▎ | 978/2230 [3:07:08<3:39:35, 10.52s/it]g-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4259, 'learning_rate': 0.00021780346820809247, 'epoch': 2.19} [WARNING|modeling_bart.py:1051] 2022-03-23 20:06:28,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:06:28,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:06:32,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:06:32,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3776, 'learning_rate': 0.00021763005780346822, 'epoch': 2.2} [WARNING|modeling_utils.py:388] 2022-03-23 20:06:32,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:06:38,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:06:40,897 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:06:40,897 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:06:40,897 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3641, 'learning_rate': 0.0002174566473988439, 'epoch': 2.2} [WARNING|modeling_utils.py:388] 2022-03-23 20:06:46,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:06:46,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:06:50,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:06:50,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 19:49:07,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 44%|█████████████████████████████████▍ | 981/2230 [3:07:36<3:23:55, 9.80s/it][WARNING|modeling_bart.py:1051] 2022-03-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 44%|█████████████████████████████████▍ | 981/2230 [3:07:36<3:23:55, 9.80s/it][WARNING|modeling_bart.py:1051] 2022-03-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:06:56,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:06:58,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:01,088 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:01,088 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:03,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:05,285 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:07,323 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:09,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:09,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:11,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:13,278 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:15,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:15,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:16,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:18,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:20,647 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:22,407 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:22,407 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:24,107 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:27,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:29,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:29,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:30,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:32,285 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:35,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:35,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:36,544 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:39,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:40,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:40,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:43,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:45,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:45,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:47,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:49,934 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:49,934 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:51,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:53,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:53,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:56,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:57,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:07:57,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4064, 'learning_rate': 0.00021537572254335259, 'epoch': 2.22} [WARNING|modeling_utils.py:388] 2022-03-23 20:08:01,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:01,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:04,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:04,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:08,347 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:11,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:11,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.685, 'learning_rate': 0.0002152023121387283, 'epoch': 2.23} [WARNING|modeling_utils.py:388] 2022-03-23 20:08:15,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:15,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:18,937 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:18,937 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:22,356 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:25,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:25,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 1.1262, 'learning_rate': 0.00021502890173410403, 'epoch': 2.23} [WARNING|modeling_utils.py:388] 2022-03-23 20:08:29,264 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:29,264 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:32,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:36,113 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:36,113 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:36,113 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:39,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:39,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:42,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:46,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:46,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:49,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:52,974 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:52,974 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.7312, 'learning_rate': 0.00021468208092485547, 'epoch': 2.23} [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5592, 'learning_rate': 0.0002145086705202312, 'epoch': 2.24} [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.6647, 'learning_rate': 0.0002143352601156069, 'epoch': 2.24} [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5828, 'learning_rate': 0.00021416184971098263, 'epoch': 2.24} [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:08:56,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 03/23/2022 20:19:04 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow {'eval_loss': 0.5549545288085938, 'eval_wer': 0.17532725109083697, 'eval_runtime': 557.4926, 'eval_samples_per_second': 4.739, 'eval_steps_per_second': 0.594, 'epoch': 2.24} [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5099, 'learning_rate': 0.0002138150289017341, 'epoch': 2.24} [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4194, 'learning_rate': 0.0002136416184971098, 'epoch': 2.25} [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4526, 'learning_rate': 0.00021346820809248551, 'epoch': 2.25} [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.499, 'learning_rate': 0.00021329479768786126, 'epoch': 2.25} [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4094, 'learning_rate': 0.00021312138728323698, 'epoch': 2.25} [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3473, 'learning_rate': 0.0002129479768786127, 'epoch': 2.26} [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2366] 2022-03-23 20:09:46,620 >> Num examples = 2642timate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4647, 'learning_rate': 0.0002127745664739884, 'epoch': 2.26} g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3955, 'learning_rate': 0.00021260115606936414, 'epoch': 2.26} g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4039, 'learning_rate': 0.00021242774566473987, 'epoch': 2.26} g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4174, 'learning_rate': 0.00021225433526011559, 'epoch': 2.26} g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3858, 'learning_rate': 0.0002120809248554913, 'epoch': 2.27} g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4267, 'learning_rate': 0.00021190751445086705, 'epoch': 2.27} g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4297, 'learning_rate': 0.00021173410404624275, 'epoch': 2.27} g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4191, 'learning_rate': 0.00021156069364161847, 'epoch': 2.27} g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 46%|██████████████████████████████████▏ | 1015/2230 [3:24:39<4:38:41, 13.76s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 46%|██████████████████████████████████▏ | 1015/2230 [3:24:39<4:38:41, 13.76s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.402, 'learning_rate': 0.0002113872832369942, 'epoch': 2.28} 46%|██████████████████████████████████▏ | 1015/2230 [3:24:39<4:38:41, 13.76s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 46%|██████████████████████████████████▏ | 1015/2230 [3:24:39<4:38:41, 13.76s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 46%|██████████████████████████████████▏ | 1015/2230 [3:24:39<4:38:41, 13.76s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 46%|██████████████████████████████████▏ | 1015/2230 [3:24:39<4:38:41, 13.76s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 46%|██████████████████████████████████▏ | 1016/2230 [3:24:50<4:27:08, 13.20s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 46%|██████████████████████████████████▏ | 1016/2230 [3:24:50<4:27:08, 13.20s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3399, 'learning_rate': 0.00021121387283236994, 'epoch': 2.28} 46%|██████████████████████████████████▏ | 1016/2230 [3:24:50<4:27:08, 13.20s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 46%|██████████████████████████████████▏ | 1016/2230 [3:24:50<4:27:08, 13.20s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 46%|██████████████████████████████████▏ | 1016/2230 [3:24:50<4:27:08, 13.20s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 46%|██████████████████████████████████▏ | 1016/2230 [3:24:50<4:27:08, 13.20s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 46%|██████████████████████████████████▏ | 1016/2230 [3:24:50<4:27:08, 13.20s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3722, 'learning_rate': 0.00021104046242774566, 'epoch': 2.28} [WARNING|modeling_utils.py:388] 2022-03-23 20:24:21,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:24:21,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:24:21,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:24:21,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:24:21,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:24:21,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3687, 'learning_rate': 0.00021086705202312135, 'epoch': 2.28} [WARNING|modeling_utils.py:388] 2022-03-23 20:24:21,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:24:21,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:24:21,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:24:21,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 46%|██████████████████████████████████▎ | 1019/2230 [3:25:26<4:06:27, 12.21s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 46%|██████████████████████████████████▎ | 1019/2230 [3:25:26<4:06:27, 12.21s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4594, 'learning_rate': 0.00021069364161849707, 'epoch': 2.28} 46%|██████████████████████████████████▎ | 1019/2230 [3:25:26<4:06:27, 12.21s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 46%|██████████████████████████████████▎ | 1019/2230 [3:25:26<4:06:27, 12.21s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 46%|██████████████████████████████████▎ | 1019/2230 [3:25:26<4:06:27, 12.21s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 46%|██████████████████████████████████▎ | 1019/2230 [3:25:26<4:06:27, 12.21s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 46%|██████████████████████████████████▎ | 1019/2230 [3:25:26<4:06:27, 12.21s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3473, 'learning_rate': 0.00021052023121387282, 'epoch': 2.29} [WARNING|modeling_utils.py:388] 2022-03-23 20:24:56,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:24:56,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:24:56,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:24:56,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:24:56,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:24:56,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3437, 'learning_rate': 0.00021034682080924854, 'epoch': 2.29} [WARNING|modeling_utils.py:388] 2022-03-23 20:24:56,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:24:56,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:24:56,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:24:56,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:24:56,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:24:56,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:25:18,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:25:18,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:25:18,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:25:18,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:25:18,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:25:18,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3963, 'learning_rate': 0.00020999999999999998, 'epoch': 2.29} [WARNING|modeling_utils.py:388] 2022-03-23 20:25:18,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:25:18,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:25:18,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:25:36,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:25:36,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:25:36,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3373, 'learning_rate': 0.00020982658959537573, 'epoch': 2.3} [WARNING|modeling_utils.py:388] 2022-03-23 20:25:36,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:25:45,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:25:45,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:25:45,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:25:45,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3339, 'learning_rate': 0.00020965317919075142, 'epoch': 2.3} [WARNING|modeling_utils.py:388] 2022-03-23 20:25:45,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:25:45,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:25:45,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:25:45,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:26:01,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:26:01,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3122, 'learning_rate': 0.00020947976878612714, 'epoch': 2.3} [WARNING|modeling_utils.py:388] 2022-03-23 20:26:01,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:26:01,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:26:01,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:26:01,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:26:01,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:26:14,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:26:14,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:26:14,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:26:14,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:26:14,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:26:14,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.38, 'learning_rate': 0.0002091329479768786, 'epoch': 2.3} [WARNING|modeling_utils.py:388] 2022-03-23 20:26:25,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:26:25,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:26:25,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:26:25,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:26:25,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:26:34,114 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:26:34,114 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:26:38,083 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:26:40,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:26:40,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:26:42,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:26:42,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:26:42,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:26:48,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:26:48,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3022, 'learning_rate': 0.00020861271676300575, 'epoch': 2.31} [WARNING|modeling_bart.py:1051] 2022-03-23 20:26:52,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:26:52,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:26:56,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:26:58,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:26:58,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:00,795 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:02,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:04,913 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:06,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:06,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:09,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:10,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:12,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:12,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:14,745 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:16,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:18,503 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:20,274 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:20,274 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:22,010 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:23,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:27,134 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:27,134 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:28,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:30,368 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:33,357 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:33,357 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:34,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:36,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:39,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:39,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:40,511 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:43,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:45,266 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:45,266 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:47,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:49,306 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:49,306 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:51,207 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:51,207 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:52,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:55,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:55,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:56,088 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:56,088 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:27:59,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:03,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:03,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:06,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:06,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:06,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:10,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:10,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:14,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:17,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:17,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:20,906 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:24,285 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:24,285 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.9364, 'learning_rate': 0.00020635838150289017, 'epoch': 2.34} [WARNING|modeling_utils.py:388] 2022-03-23 20:28:27,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:27,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:31,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:31,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:34,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:38,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:38,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5846, 'learning_rate': 0.0002061849710982659, 'epoch': 2.34} [WARNING|modeling_utils.py:388] 2022-03-23 20:28:41,530 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:41,530 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:44,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:48,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:48,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:48,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:51,607 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:55,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:55,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:55,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:55,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:28:55,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4982, 'learning_rate': 0.0002058381502890173, 'epoch': 2.35} 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4853, 'learning_rate': 0.00020566473988439305, 'epoch': 2.35} 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4355, 'learning_rate': 0.00020549132947976877, 'epoch': 2.35} 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5347, 'learning_rate': 0.0002053179190751445, 'epoch': 2.35} 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▏ | 1047/2230 [3:29:49<3:58:07, 12.08s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4829, 'learning_rate': 0.00020514450867052021, 'epoch': 2.36} 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3672, 'learning_rate': 0.00020497109826589596, 'epoch': 2.36} 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4293, 'learning_rate': 0.00020479768786127166, 'epoch': 2.36} 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4222, 'learning_rate': 0.00020462427745664738, 'epoch': 2.36} 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▎ | 1051/2230 [3:30:42<4:15:52, 13.02s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▍ | 1055/2230 [3:31:34<4:12:35, 12.90s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▍ | 1055/2230 [3:31:34<4:12:35, 12.90s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4015, 'learning_rate': 0.0002044508670520231, 'epoch': 2.37} 47%|███████████████████████████████████▍ | 1055/2230 [3:31:34<4:12:35, 12.90s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▍ | 1055/2230 [3:31:34<4:12:35, 12.90s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▍ | 1055/2230 [3:31:34<4:12:35, 12.90s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▍ | 1055/2230 [3:31:34<4:12:35, 12.90s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4734, 'learning_rate': 0.00020427745664739885, 'epoch': 2.37} 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3764, 'learning_rate': 0.00020410404624277457, 'epoch': 2.37} 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4224, 'learning_rate': 0.00020393063583815026, 'epoch': 2.37} 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3608, 'learning_rate': 0.00020375722543352598, 'epoch': 2.37} 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3688, 'learning_rate': 0.00020358381502890173, 'epoch': 2.38} 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3761, 'learning_rate': 0.00020341040462427745, 'epoch': 2.38} 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3932, 'learning_rate': 0.00020323699421965317, 'epoch': 2.38} 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4005, 'learning_rate': 0.00020306358381502886, 'epoch': 2.38} 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4029, 'learning_rate': 0.0002028901734104046, 'epoch': 2.39} 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2571, 'learning_rate': 0.00020271676300578033, 'epoch': 2.39} 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 47%|███████████████████████████████████▌ | 1056/2230 [3:31:47<4:11:04, 12.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▊ | 1066/2230 [3:33:49<3:55:23, 12.13s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▊ | 1066/2230 [3:33:49<3:55:23, 12.13s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▊ | 1066/2230 [3:33:49<3:55:23, 12.13s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▊ | 1066/2230 [3:33:49<3:55:23, 12.13s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▊ | 1066/2230 [3:33:49<3:55:23, 12.13s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▊ | 1066/2230 [3:33:49<3:55:23, 12.13s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▊ | 1066/2230 [3:33:49<3:55:23, 12.13s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3363, 'learning_rate': 0.00020236994219653177, 'epoch': 2.39} 48%|███████████████████████████████████▊ | 1066/2230 [3:33:49<3:55:23, 12.13s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▊ | 1066/2230 [3:33:49<3:55:23, 12.13s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▊ | 1066/2230 [3:33:49<3:55:23, 12.13s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▊ | 1066/2230 [3:33:49<3:55:23, 12.13s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▊ | 1066/2230 [3:33:49<3:55:23, 12.13s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▊ | 1066/2230 [3:33:49<3:55:23, 12.13s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3423, 'learning_rate': 0.00020219653179190752, 'epoch': 2.39} 48%|███████████████████████████████████▊ | 1066/2230 [3:33:49<3:55:23, 12.13s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▊ | 1066/2230 [3:33:49<3:55:23, 12.13s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▊ | 1066/2230 [3:33:49<3:55:23, 12.13s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▊ | 1066/2230 [3:33:49<3:55:23, 12.13s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▉ | 1069/2230 [3:34:24<3:48:57, 11.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▉ | 1069/2230 [3:34:24<3:48:57, 11.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3071, 'learning_rate': 0.00020202312138728321, 'epoch': 2.4} 48%|███████████████████████████████████▉ | 1069/2230 [3:34:24<3:48:57, 11.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▉ | 1069/2230 [3:34:24<3:48:57, 11.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▉ | 1069/2230 [3:34:24<3:48:57, 11.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▉ | 1069/2230 [3:34:24<3:48:57, 11.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▉ | 1069/2230 [3:34:24<3:48:57, 11.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3336, 'learning_rate': 0.00020184971098265893, 'epoch': 2.4} 48%|███████████████████████████████████▉ | 1069/2230 [3:34:24<3:48:57, 11.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▉ | 1069/2230 [3:34:24<3:48:57, 11.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▉ | 1069/2230 [3:34:24<3:48:57, 11.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|███████████████████████████████████▉ | 1069/2230 [3:34:24<3:48:57, 11.83s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|████████████████████████████████████ | 1071/2230 [3:34:47<3:44:27, 11.62s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|████████████████████████████████████ | 1071/2230 [3:34:47<3:44:27, 11.62s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3494, 'learning_rate': 0.00020167630057803466, 'epoch': 2.4} [WARNING|modeling_utils.py:388] 2022-03-23 20:34:07,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:34:07,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:34:07,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:34:07,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:34:07,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:34:07,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3916, 'learning_rate': 0.0002015028901734104, 'epoch': 2.4} [WARNING|modeling_utils.py:388] 2022-03-23 20:34:07,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:34:07,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:34:07,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|████████████████████████████████████ | 1073/2230 [3:35:10<3:39:31, 11.38s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|████████████████████████████████████ | 1073/2230 [3:35:10<3:39:31, 11.38s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3232, 'learning_rate': 0.00020132947976878612, 'epoch': 2.41} 48%|████████████████████████████████████ | 1073/2230 [3:35:10<3:39:31, 11.38s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|████████████████████████████████████ | 1073/2230 [3:35:10<3:39:31, 11.38s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|████████████████████████████████████ | 1073/2230 [3:35:10<3:39:31, 11.38s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|████████████████████████████████████ | 1073/2230 [3:35:10<3:39:31, 11.38s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|████████████████████████████████████ | 1073/2230 [3:35:10<3:39:31, 11.38s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|████████████████████████████████████ | 1073/2230 [3:35:10<3:39:31, 11.38s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.325, 'learning_rate': 0.00020115606936416184, 'epoch': 2.41} 48%|████████████████████████████████████ | 1073/2230 [3:35:10<3:39:31, 11.38s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|████████████████████████████████████ | 1073/2230 [3:35:10<3:39:31, 11.38s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|████████████████████████████████████ | 1073/2230 [3:35:10<3:39:31, 11.38s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|████████████████████████████████████▏ | 1075/2230 [3:35:32<3:37:33, 11.30s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|████████████████████████████████████▏ | 1075/2230 [3:35:32<3:37:33, 11.30s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2942, 'learning_rate': 0.00020098265895953754, 'epoch': 2.41} 48%|████████████████████████████████████▏ | 1075/2230 [3:35:32<3:37:33, 11.30s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|████████████████████████████████████▏ | 1075/2230 [3:35:32<3:37:33, 11.30s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|████████████████████████████████████▏ | 1075/2230 [3:35:32<3:37:33, 11.30s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|████████████████████████████████████▏ | 1075/2230 [3:35:32<3:37:33, 11.30s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|████████████████████████████████████▏ | 1075/2230 [3:35:32<3:37:33, 11.30s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3155, 'learning_rate': 0.00020080924855491329, 'epoch': 2.41} 48%|████████████████████████████████████▏ | 1075/2230 [3:35:32<3:37:33, 11.30s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:35:04,735 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:35:04,735 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:35:04,735 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:35:10,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:35:10,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3347, 'learning_rate': 0.000200635838150289, 'epoch': 2.41} [WARNING|modeling_utils.py:388] 2022-03-23 20:35:10,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:35:10,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:35:10,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:35:20,781 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:35:20,781 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3422, 'learning_rate': 0.00020046242774566473, 'epoch': 2.42} [WARNING|modeling_utils.py:388] 2022-03-23 20:35:20,781 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:35:20,781 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:35:29,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:35:29,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:35:29,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3474, 'learning_rate': 0.00020028901734104045, 'epoch': 2.42} [WARNING|modeling_bart.py:1051] 2022-03-23 20:35:29,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:35:37,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:35:37,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:35:37,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:35:37,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:35:43,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:35:45,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:35:45,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:35:45,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 48%|████████████████████████████████████▎ | 1081/2230 [3:36:33<3:14:09, 10.14s/it]g-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:35:51,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:35:53,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:35:53,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:35:57,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:35:57,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3231, 'learning_rate': 0.0001997687861271676, 'epoch': 2.43} [WARNING|modeling_utils.py:388] 2022-03-23 20:36:01,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:36:01,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:05,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:05,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:06:53,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▍ | 1083/2230 [3:36:51<3:00:56, 9.47s/it][WARNING|modeling_bart.py:1051] 2022-03-23 20:36:07,556 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:09,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:07,556 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:11,598 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:07,556 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:13,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:07,556 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:13,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:07,556 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▍ | 1084/2230 [3:36:59<2:53:00, 9.06s/it][WARNING|modeling_bart.py:1051] 2022-03-23 20:36:15,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:17,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:15,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:19,356 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:15,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:21,169 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:15,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:21,169 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:15,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▍ | 1085/2230 [3:37:06<2:43:47, 8.58s/it][WARNING|modeling_bart.py:1051] 2022-03-23 20:36:23,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:24,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:23,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:28,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:23,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:28,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:23,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▌ | 1086/2230 [3:37:13<2:33:27, 8.05s/it][WARNING|modeling_bart.py:1051] 2022-03-23 20:36:29,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:31,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:29,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:32,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:29,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:32,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:29,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▌ | 1087/2230 [3:37:19<2:22:39, 7.49s/it][WARNING|modeling_bart.py:1051] 2022-03-23 20:36:35,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:37,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:35,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:40,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:35,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:40,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:35,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▌ | 1088/2230 [3:37:25<2:12:47, 6.98s/it][WARNING|modeling_bart.py:1051] 2022-03-23 20:36:41,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:44,009 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:41,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:44,009 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:41,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▋ | 1089/2230 [3:37:30<1:59:48, 6.30s/it][WARNING|modeling_bart.py:1051] 2022-03-23 20:36:46,279 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:48,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:46,279 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:48,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:46,279 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▋ | 1090/2230 [3:37:34<1:46:41, 5.62s/it][WARNING|modeling_bart.py:1051] 2022-03-23 20:36:50,258 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:52,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:50,258 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:52,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:50,258 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:54,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:53,876 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:56,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:53,876 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:36:56,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:53,876 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▋ | 1092/2230 [3:37:41<1:24:27, 4.45s/it][WARNING|modeling_bart.py:1051] 2022-03-23 20:36:57,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▋ | 1092/2230 [3:37:41<1:24:27, 4.45s/it][WARNING|modeling_bart.py:1051] 2022-03-23 20:36:57,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:01,506 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:57,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:05,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:57,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:05,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:57,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:08,485 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:36:57,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▊ | 1093/2230 [3:37:55<2:19:52, 7.38s/it] Setting `use_cache=False`...1] 2022-03-23 20:36:57,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▊ | 1093/2230 [3:37:55<2:19:52, 7.38s/it] Setting `use_cache=False`...1] 2022-03-23 20:36:57,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▊ | 1093/2230 [3:37:55<2:19:52, 7.38s/it][WARNING|modeling_bart.py:1051] 2022-03-23 20:37:12,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▊ | 1093/2230 [3:37:55<2:19:52, 7.38s/it][WARNING|modeling_bart.py:1051] 2022-03-23 20:37:12,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:15,471 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:12,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:18,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:12,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:18,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:12,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:22,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:12,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:22,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:12,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▊ | 1094/2230 [3:38:09<2:55:37, 9.28s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:12,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▊ | 1094/2230 [3:38:09<2:55:37, 9.28s/it][WARNING|modeling_bart.py:1051] 2022-03-23 20:37:25,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:29,149 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:25,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:29,149 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:25,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:32,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:25,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:32,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:25,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:35,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:25,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:35,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:25,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▊ | 1095/2230 [3:38:22<3:20:38, 10.61s/it][WARNING|modeling_bart.py:1051] 2022-03-23 20:37:39,440 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▊ | 1095/2230 [3:38:22<3:20:38, 10.61s/it][WARNING|modeling_bart.py:1051] 2022-03-23 20:37:39,440 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:42,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:39,440 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:46,067 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:39,440 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:46,067 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:39,440 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:49,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:39,440 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▊ | 1096/2230 [3:38:36<3:36:14, 11.44s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:39,440 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▊ | 1096/2230 [3:38:36<3:36:14, 11.44s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:39,440 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▊ | 1096/2230 [3:38:36<3:36:14, 11.44s/it][WARNING|modeling_bart.py:1051] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:56,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:56,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:56,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:56,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:56,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:56,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4258, 'learning_rate': 0.00019716763005780345, 'epoch': 2.46} [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:56,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:56,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:56,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:56,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:37:56,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4349, 'learning_rate': 0.00019699421965317917, 'epoch': 2.46} 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4175, 'learning_rate': 0.0001968208092485549, 'epoch': 2.46} 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.453, 'learning_rate': 0.0001966473988439306, 'epoch': 2.47} 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4161, 'learning_rate': 0.00019647398843930636, 'epoch': 2.47} 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4631, 'learning_rate': 0.00019630057803468208, 'epoch': 2.47} 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3479, 'learning_rate': 0.00019612716763005777, 'epoch': 2.47} 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 49%|████████████████████████████████████▉ | 1098/2230 [3:39:02<3:54:02, 12.41s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3392, 'learning_rate': 0.0001959537572254335, 'epoch': 2.48} 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3378, 'learning_rate': 0.00019578034682080924, 'epoch': 2.48} 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3503, 'learning_rate': 0.00019560693641618496, 'epoch': 2.48} 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3082, 'learning_rate': 0.00019543352601156068, 'epoch': 2.48} 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3127, 'learning_rate': 0.00019526011560693637, 'epoch': 2.48} 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▏ | 1104/2230 [3:40:21<4:01:49, 12.89s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▎ | 1109/2230 [3:41:24<3:55:22, 12.60s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▎ | 1109/2230 [3:41:24<3:55:22, 12.60s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3518, 'learning_rate': 0.00019508670520231212, 'epoch': 2.49} 50%|█████████████████████████████████████▎ | 1109/2230 [3:41:24<3:55:22, 12.60s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▎ | 1109/2230 [3:41:24<3:55:22, 12.60s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▎ | 1109/2230 [3:41:24<3:55:22, 12.60s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▎ | 1109/2230 [3:41:24<3:55:22, 12.60s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▎ | 1110/2230 [3:41:36<3:54:09, 12.54s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▎ | 1110/2230 [3:41:36<3:54:09, 12.54s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2584, 'learning_rate': 0.00019491329479768784, 'epoch': 2.49} 50%|█████████████████████████████████████▎ | 1110/2230 [3:41:36<3:54:09, 12.54s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▎ | 1110/2230 [3:41:36<3:54:09, 12.54s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▎ | 1110/2230 [3:41:36<3:54:09, 12.54s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▎ | 1110/2230 [3:41:36<3:54:09, 12.54s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▎ | 1111/2230 [3:41:49<3:52:41, 12.48s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▎ | 1111/2230 [3:41:49<3:52:41, 12.48s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2994, 'learning_rate': 0.00019473988439306356, 'epoch': 2.49} 50%|█████████████████████████████████████▎ | 1111/2230 [3:41:49<3:52:41, 12.48s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▎ | 1111/2230 [3:41:49<3:52:41, 12.48s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▎ | 1111/2230 [3:41:49<3:52:41, 12.48s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▎ | 1111/2230 [3:41:49<3:52:41, 12.48s/it] Setting `use_cache=False`...1] 2022-03-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:41:16,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:41:16,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3466, 'learning_rate': 0.00019456647398843928, 'epoch': 2.49} [WARNING|modeling_utils.py:388] 2022-03-23 20:41:16,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:41:16,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:41:16,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:41:16,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3337, 'learning_rate': 0.00019439306358381503, 'epoch': 2.5} g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3306, 'learning_rate': 0.00019421965317919073, 'epoch': 2.5} 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2953, 'learning_rate': 0.00019404624277456645, 'epoch': 2.5} 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3109, 'learning_rate': 0.00019387283236994217, 'epoch': 2.5} 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3141, 'learning_rate': 0.00019369942196531792, 'epoch': 2.5} 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▍ | 1114/2230 [3:42:25<3:49:31, 12.34s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▌ | 1118/2230 [3:43:13<3:40:46, 11.91s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▌ | 1118/2230 [3:43:13<3:40:46, 11.91s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2811, 'learning_rate': 0.00019335260115606933, 'epoch': 2.51} [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.308, 'learning_rate': 0.00019317919075144505, 'epoch': 2.51} [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3758, 'learning_rate': 0.0001930057803468208, 'epoch': 2.51} [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3172, 'learning_rate': 0.00019283236994219652, 'epoch': 2.52} [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3789, 'learning_rate': 0.00019265895953757224, 'epoch': 2.52} [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:42:33,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▊ | 1124/2230 [3:44:21<3:27:59, 11.28s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▊ | 1124/2230 [3:44:21<3:27:59, 11.28s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.326, 'learning_rate': 0.00019248554913294796, 'epoch': 2.52} 50%|█████████████████████████████████████▊ | 1124/2230 [3:44:21<3:27:59, 11.28s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▊ | 1124/2230 [3:44:21<3:27:59, 11.28s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▊ | 1124/2230 [3:44:21<3:27:59, 11.28s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▊ | 1124/2230 [3:44:21<3:27:59, 11.28s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▊ | 1124/2230 [3:44:21<3:27:59, 11.28s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▊ | 1124/2230 [3:44:21<3:27:59, 11.28s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3814, 'learning_rate': 0.0001923121387283237, 'epoch': 2.52} 50%|█████████████████████████████████████▊ | 1124/2230 [3:44:21<3:27:59, 11.28s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▊ | 1124/2230 [3:44:21<3:27:59, 11.28s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▊ | 1124/2230 [3:44:21<3:27:59, 11.28s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▊ | 1124/2230 [3:44:21<3:27:59, 11.28s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▊ | 1124/2230 [3:44:21<3:27:59, 11.28s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3633, 'learning_rate': 0.0001921387283236994, 'epoch': 2.52} 50%|█████████████████████████████████████▊ | 1124/2230 [3:44:21<3:27:59, 11.28s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 50%|█████████████████████████████████████▊ | 1124/2230 [3:44:21<3:27:59, 11.28s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:44:07,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:44:07,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:44:07,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.33, 'learning_rate': 0.00019196531791907512, 'epoch': 2.53} [WARNING|modeling_utils.py:388] 2022-03-23 20:44:07,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:44:07,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:44:17,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:44:17,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:44:17,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3174, 'learning_rate': 0.00019179190751445084, 'epoch': 2.53} [WARNING|modeling_utils.py:388] 2022-03-23 20:44:17,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:44:25,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:44:25,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:44:25,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:44:25,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3249, 'learning_rate': 0.0001916184971098266, 'epoch': 2.53} [WARNING|modeling_utils.py:388] 2022-03-23 20:44:33,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:44:35,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:44:35,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:44:35,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2607, 'learning_rate': 0.0001914450867052023, 'epoch': 2.53} [WARNING|modeling_utils.py:388] 2022-03-23 20:44:35,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:44:35,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:44:45,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:44:47,607 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:44:47,607 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:44:49,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:44:52,197 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:44:54,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:44:56,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:44:58,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:44:58,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:45:00,781 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:45:00,781 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:04,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:04,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:06,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:08,324 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:10,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:10,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:11,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:13,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:17,293 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:18,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:18,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:20,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:22,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:25,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:25,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:26,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:28,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:31,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:31,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:32,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:33,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:36,868 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:36,868 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:38,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:40,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:40,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:42,856 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:44,899 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:44,899 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:46,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:48,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:48,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:51,272 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:51,272 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:52,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:52,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:55,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:59,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:45:59,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:03,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:03,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:06,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:06,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.7567, 'learning_rate': 0.00018919075144508668, 'epoch': 2.56} [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:10,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:13,560 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:13,560 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:17,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:17,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:17,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:20,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:24,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:24,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:27,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:27,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:30,845 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:34,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:34,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.504, 'learning_rate': 0.00018884393063583815, 'epoch': 2.57} [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:37,671 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:37,671 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:40,986 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:44,297 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:44,297 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:47,608 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:47,608 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4176, 'learning_rate': 0.00018867052023121387, 'epoch': 2.57} [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:51,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4394, 'learning_rate': 0.0001884971098265896, 'epoch': 2.57} [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3826, 'learning_rate': 0.00018832369942196528, 'epoch': 2.57} [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4139, 'learning_rate': 0.00018815028901734103, 'epoch': 2.58} [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4348, 'learning_rate': 0.00018797687861271675, 'epoch': 2.58} [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4796, 'learning_rate': 0.00018780346820809247, 'epoch': 2.58} [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4092, 'learning_rate': 0.0001876300578034682, 'epoch': 2.58} [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:46:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3758, 'learning_rate': 0.00018728323699421963, 'epoch': 2.59} 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3988, 'learning_rate': 0.00018710982658959536, 'epoch': 2.59} 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3302, 'learning_rate': 0.00018693641618497108, 'epoch': 2.59} 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4761, 'learning_rate': 0.00018676300578034682, 'epoch': 2.59} 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4218, 'learning_rate': 0.00018658959537572254, 'epoch': 2.6} 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|██████████████████████████████████████▊ | 1153/2230 [3:49:05<3:52:03, 12.93s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:49:34,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:49:34,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:49:34,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2673, 'learning_rate': 0.00018641618497109824, 'epoch': 2.6} [WARNING|modeling_utils.py:388] 2022-03-23 20:49:34,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:49:34,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:49:34,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:49:34,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:49:34,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:49:34,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3068, 'learning_rate': 0.00018624277456647396, 'epoch': 2.6} [WARNING|modeling_utils.py:388] 2022-03-23 20:49:34,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:49:34,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:49:34,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:49:34,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:49:34,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:49:34,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3173, 'learning_rate': 0.0001860693641618497, 'epoch': 2.6} [WARNING|modeling_utils.py:388] 2022-03-23 20:49:34,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:49:34,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:49:34,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2941, 'learning_rate': 0.00018589595375722543, 'epoch': 2.61} [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3206, 'learning_rate': 0.00018572254335260115, 'epoch': 2.61} [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4003, 'learning_rate': 0.00018554913294797684, 'epoch': 2.61} [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.302, 'learning_rate': 0.0001853757225433526, 'epoch': 2.61} [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:50:12,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|███████████████████████████████████████▏ | 1166/2230 [3:51:45<3:34:36, 12.10s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|███████████████████████████████████████▏ | 1166/2230 [3:51:45<3:34:36, 12.10s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3248, 'learning_rate': 0.0001852023121387283, 'epoch': 2.61} 52%|███████████████████████████████████████▏ | 1166/2230 [3:51:45<3:34:36, 12.10s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|███████████████████████████████████████▏ | 1166/2230 [3:51:45<3:34:36, 12.10s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|███████████████████████████████████████▏ | 1166/2230 [3:51:45<3:34:36, 12.10s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:51:11,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:51:11,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:51:11,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.334, 'learning_rate': 0.00018502890173410403, 'epoch': 2.62} [WARNING|modeling_utils.py:388] 2022-03-23 20:51:11,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:51:11,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:51:11,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:51:11,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:51:11,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2525, 'learning_rate': 0.00018485549132947975, 'epoch': 2.62} [WARNING|modeling_utils.py:388] 2022-03-23 20:51:11,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:51:30,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:51:30,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:51:30,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|███████████████████████████████████████▎ | 1169/2230 [3:52:21<3:29:12, 11.83s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|███████████████████████████████████████▎ | 1169/2230 [3:52:21<3:29:12, 11.83s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3136, 'learning_rate': 0.0001846820809248555, 'epoch': 2.62} 52%|███████████████████████████████████████▎ | 1169/2230 [3:52:21<3:29:12, 11.83s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|███████████████████████████████████████▎ | 1169/2230 [3:52:21<3:29:12, 11.83s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|███████████████████████████████████████▎ | 1169/2230 [3:52:21<3:29:12, 11.83s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|███████████████████████████████████████▎ | 1169/2230 [3:52:21<3:29:12, 11.83s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|███████████████████████████████████████▎ | 1169/2230 [3:52:21<3:29:12, 11.83s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|███████████████████████████████████████▎ | 1169/2230 [3:52:21<3:29:12, 11.83s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2709, 'learning_rate': 0.0001845086705202312, 'epoch': 2.62} 52%|███████████████████████████████████████▎ | 1169/2230 [3:52:21<3:29:12, 11.83s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|███████████████████████████████████████▎ | 1169/2230 [3:52:21<3:29:12, 11.83s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 52%|███████████████████████████████████████▎ | 1169/2230 [3:52:21<3:29:12, 11.83s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 53%|███████████████████████████████████████▍ | 1171/2230 [3:52:44<3:25:21, 11.64s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 53%|███████████████████████████████████████▍ | 1171/2230 [3:52:44<3:25:21, 11.64s/it]g-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2941, 'learning_rate': 0.0001843352601156069, 'epoch': 2.63} [WARNING|modeling_utils.py:388] 2022-03-23 20:52:03,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:52:03,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:52:03,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:52:03,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:52:03,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:52:03,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.301, 'learning_rate': 0.00018416184971098263, 'epoch': 2.63} [WARNING|modeling_utils.py:388] 2022-03-23 20:52:03,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:52:17,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:52:17,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:52:21,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:52:21,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3081, 'learning_rate': 0.00018398843930635838, 'epoch': 2.63} [WARNING|modeling_utils.py:388] 2022-03-23 20:52:21,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:52:21,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:52:21,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:52:32,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:52:32,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2646, 'learning_rate': 0.0001838150289017341, 'epoch': 2.63} [WARNING|modeling_bart.py:1051] 2022-03-23 20:52:32,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:52:32,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:52:32,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:52:32,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:52:32,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 53%|███████████████████████████████████████▌ | 1175/2230 [3:53:29<3:19:28, 11.34s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 53%|███████████████████████████████████████▌ | 1175/2230 [3:53:29<3:19:28, 11.34s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 53%|███████████████████████████████████████▌ | 1175/2230 [3:53:29<3:19:28, 11.34s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 53%|███████████████████████████████████████▌ | 1175/2230 [3:53:29<3:19:28, 11.34s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 53%|███████████████████████████████████████▌ | 1175/2230 [3:53:29<3:19:28, 11.34s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 53%|███████████████████████████████████████▌ | 1175/2230 [3:53:29<3:19:28, 11.34s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 53%|███████████████████████████████████████▌ | 1175/2230 [3:53:29<3:19:28, 11.34s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3115, 'learning_rate': 0.00018346820809248552, 'epoch': 2.64} 53%|███████████████████████████████████████▌ | 1175/2230 [3:53:29<3:19:28, 11.34s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 53%|███████████████████████████████████████▌ | 1175/2230 [3:53:29<3:19:28, 11.34s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 53%|███████████████████████████████████████▌ | 1175/2230 [3:53:29<3:19:28, 11.34s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 53%|███████████████████████████████████████▌ | 1175/2230 [3:53:29<3:19:28, 11.34s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 53%|███████████████████████████████████████▌ | 1175/2230 [3:53:29<3:19:28, 11.34s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3095, 'learning_rate': 0.00018329479768786124, 'epoch': 2.64} 53%|███████████████████████████████████████▌ | 1175/2230 [3:53:29<3:19:28, 11.34s/it] Setting `use_cache=False`...e computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:53:10,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:53:10,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:53:10,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:53:10,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2983, 'learning_rate': 0.00018312138728323698, 'epoch': 2.64} [WARNING|modeling_utils.py:388] 2022-03-23 20:53:10,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:53:21,104 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:53:21,104 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:53:21,104 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:53:21,104 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:53:27,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:53:27,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:53:27,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:53:33,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:53:33,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:53:33,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3024, 'learning_rate': 0.00018277456647398843, 'epoch': 2.65} [WARNING|modeling_utils.py:388] 2022-03-23 20:53:39,554 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:53:41,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:53:41,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:37:52,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 53%|███████████████████████████████████████▋ | 1181/2230 [3:54:29<2:56:13, 10.08s/it][WARNING|modeling_bart.py:1051] 2022-03-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 53%|███████████████████████████████████████▋ | 1181/2230 [3:54:29<2:56:13, 10.08s/it][WARNING|modeling_bart.py:1051] 2022-03-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2837, 'learning_rate': 0.00018260115606936412, 'epoch': 2.65} [WARNING|modeling_utils.py:388] 2022-03-23 20:53:49,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:53:52,193 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:53:54,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:53:54,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2031, 'learning_rate': 0.00018242774566473987, 'epoch': 2.65} [WARNING|modeling_bart.py:1051] 2022-03-23 20:53:58,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:00,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:02,595 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:02,595 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:04,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:06,699 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:08,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:10,561 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:10,561 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:12,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:14,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:16,201 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:17,967 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:17,967 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:19,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:23,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:24,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:24,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:26,409 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:27,934 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:30,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:30,873 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:32,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:35,427 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:36,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:36,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:39,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:40,487 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:40,487 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:42,792 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:44,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:44,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:46,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:49,497 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:49,497 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:51,183 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:52,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:52,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:55,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:55,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:59,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:54:59,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:02,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:02,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:06,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:06,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:09,804 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:09,804 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:13,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:13,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:16,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:20,013 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:20,013 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:20,013 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:23,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:23,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:26,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:30,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:30,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:33,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:33,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:37,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:37,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:40,410 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:40,410 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:43,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:47,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:47,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:47,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4206, 'learning_rate': 0.0001798265895953757, 'epoch': 2.68} [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4402, 'learning_rate': 0.00017965317919075145, 'epoch': 2.69} [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3913, 'learning_rate': 0.00017947976878612715, 'epoch': 2.69} [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3591, 'learning_rate': 0.00017930635838150287, 'epoch': 2.69} [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3962, 'learning_rate': 0.0001791329479768786, 'epoch': 2.69} [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 20:55:50,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3368, 'learning_rate': 0.00017895953757225434, 'epoch': 2.7} 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.339, 'learning_rate': 0.00017878612716763006, 'epoch': 2.7} 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2718, 'learning_rate': 0.00017861271676300575, 'epoch': 2.7} 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2501, 'learning_rate': 0.00017843930635838147, 'epoch': 2.7} 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3284, 'learning_rate': 0.00017826589595375722, 'epoch': 2.7} 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2951, 'learning_rate': 0.00017809248554913294, 'epoch': 2.71} 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▍ | 1202/2230 [3:57:51<3:42:05, 12.96s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▋ | 1208/2230 [3:59:07<3:34:38, 12.60s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▋ | 1208/2230 [3:59:07<3:34:38, 12.60s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▋ | 1208/2230 [3:59:07<3:34:38, 12.60s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▋ | 1208/2230 [3:59:07<3:34:38, 12.60s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▋ | 1208/2230 [3:59:07<3:34:38, 12.60s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▋ | 1208/2230 [3:59:07<3:34:38, 12.60s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▋ | 1209/2230 [3:59:19<3:33:25, 12.54s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▋ | 1209/2230 [3:59:19<3:33:25, 12.54s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3452, 'learning_rate': 0.00017774566473988435, 'epoch': 2.71} 54%|████████████████████████████████████████▋ | 1209/2230 [3:59:19<3:33:25, 12.54s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▋ | 1209/2230 [3:59:19<3:33:25, 12.54s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▋ | 1209/2230 [3:59:19<3:33:25, 12.54s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▋ | 1209/2230 [3:59:19<3:33:25, 12.54s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▋ | 1210/2230 [3:59:31<3:32:12, 12.48s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▋ | 1210/2230 [3:59:31<3:32:12, 12.48s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3002, 'learning_rate': 0.0001775722543352601, 'epoch': 2.71} 54%|████████████████████████████████████████▋ | 1210/2230 [3:59:31<3:32:12, 12.48s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▋ | 1210/2230 [3:59:31<3:32:12, 12.48s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▋ | 1210/2230 [3:59:31<3:32:12, 12.48s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▋ | 1210/2230 [3:59:31<3:32:12, 12.48s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▋ | 1211/2230 [3:59:44<3:30:52, 12.42s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▋ | 1211/2230 [3:59:44<3:30:52, 12.42s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3082, 'learning_rate': 0.00017739884393063582, 'epoch': 2.72} 54%|████████████████████████████████████████▋ | 1211/2230 [3:59:44<3:30:52, 12.42s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▋ | 1211/2230 [3:59:44<3:30:52, 12.42s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▋ | 1211/2230 [3:59:44<3:30:52, 12.42s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▋ | 1211/2230 [3:59:44<3:30:52, 12.42s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▊ | 1212/2230 [3:59:56<3:29:41, 12.36s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▊ | 1212/2230 [3:59:56<3:29:41, 12.36s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2915, 'learning_rate': 0.00017722543352601154, 'epoch': 2.72} 54%|████████████████████████████████████████▊ | 1212/2230 [3:59:56<3:29:41, 12.36s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▊ | 1212/2230 [3:59:56<3:29:41, 12.36s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▊ | 1212/2230 [3:59:56<3:29:41, 12.36s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▊ | 1212/2230 [3:59:56<3:29:41, 12.36s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3227, 'learning_rate': 0.00017705202312138726, 'epoch': 2.72} Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▊ | 1214/2230 [4:00:21<3:28:54, 12.34s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▊ | 1214/2230 [4:00:21<3:28:54, 12.34s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2923, 'learning_rate': 0.000176878612716763, 'epoch': 2.72} 54%|████████████████████████████████████████▊ | 1214/2230 [4:00:21<3:28:54, 12.34s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▊ | 1214/2230 [4:00:21<3:28:54, 12.34s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▊ | 1214/2230 [4:00:21<3:28:54, 12.34s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▊ | 1214/2230 [4:00:21<3:28:54, 12.34s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▊ | 1214/2230 [4:00:21<3:28:54, 12.34s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 54%|████████████████████████████████████████▊ | 1214/2230 [4:00:21<3:28:54, 12.34s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:59:51,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:59:51,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:59:51,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:59:51,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:59:51,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:59:51,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:59:51,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3016, 'learning_rate': 0.00017653179190751442, 'epoch': 2.73} [WARNING|modeling_utils.py:388] 2022-03-23 20:59:51,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:59:51,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:59:51,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:59:51,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:59:51,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2278, 'learning_rate': 0.00017635838150289015, 'epoch': 2.73} [WARNING|modeling_utils.py:388] 2022-03-23 20:59:51,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:59:51,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:59:51,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 20:59:51,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 55%|████████████████████████████████████████▉ | 1218/2230 [4:01:08<3:20:22, 11.88s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 55%|████████████████████████████████████████▉ | 1218/2230 [4:01:08<3:20:22, 11.88s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2899, 'learning_rate': 0.0001761849710982659, 'epoch': 2.73} 55%|████████████████████████████████████████▉ | 1218/2230 [4:01:08<3:20:22, 11.88s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 55%|████████████████████████████████████████▉ | 1218/2230 [4:01:08<3:20:22, 11.88s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 55%|████████████████████████████████████████▉ | 1218/2230 [4:01:08<3:20:22, 11.88s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 55%|████████████████████████████████████████▉ | 1218/2230 [4:01:08<3:20:22, 11.88s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 55%|████████████████████████████████████████▉ | 1218/2230 [4:01:08<3:20:22, 11.88s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 55%|████████████████████████████████████████▉ | 1218/2230 [4:01:08<3:20:22, 11.88s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2595, 'learning_rate': 0.00017601156069364161, 'epoch': 2.73} 55%|████████████████████████████████████████▉ | 1218/2230 [4:01:08<3:20:22, 11.88s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 55%|████████████████████████████████████████▉ | 1218/2230 [4:01:08<3:20:22, 11.88s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 55%|████████████████████████████████████████▉ | 1218/2230 [4:01:08<3:20:22, 11.88s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:00:46,723 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:00:46,723 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:00:46,723 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2312, 'learning_rate': 0.0001758381502890173, 'epoch': 2.74} [WARNING|modeling_utils.py:388] 2022-03-23 21:00:46,723 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:00:46,723 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:00:56,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:00:56,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:00:56,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2796, 'learning_rate': 0.00017566473988439303, 'epoch': 2.74} [WARNING|modeling_bart.py:1051] 2022-03-23 21:01:03,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:01:03,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:01:03,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:01:09,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:01:09,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2577, 'learning_rate': 0.00017549132947976878, 'epoch': 2.74} [WARNING|modeling_utils.py:388] 2022-03-23 21:01:09,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:01:09,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:01:09,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:01:09,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:01:09,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3298, 'learning_rate': 0.0001753179190751445, 'epoch': 2.74} [WARNING|modeling_utils.py:388] 2022-03-23 21:01:09,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:01:09,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:01:09,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:01:09,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:01:09,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:01:09,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2904, 'learning_rate': 0.00017514450867052022, 'epoch': 2.74} [WARNING|modeling_utils.py:388] 2022-03-23 21:01:09,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:01:09,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:01:09,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:01:09,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:01:09,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2401, 'learning_rate': 0.00017497109826589594, 'epoch': 2.75} [WARNING|modeling_utils.py:388] 2022-03-23 21:01:45,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:01:45,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:01:49,911 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:01:49,911 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 55%|█████████████████████████████████████████▏ | 1226/2230 [4:02:38<3:07:08, 11.18s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 55%|█████████████████████████████████████████▏ | 1226/2230 [4:02:38<3:07:08, 11.18s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.273, 'learning_rate': 0.00017479768786127169, 'epoch': 2.75} 55%|█████████████████████████████████████████▏ | 1226/2230 [4:02:38<3:07:08, 11.18s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 55%|█████████████████████████████████████████▏ | 1226/2230 [4:02:38<3:07:08, 11.18s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 55%|█████████████████████████████████████████▏ | 1226/2230 [4:02:38<3:07:08, 11.18s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:02:04,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:02:04,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2479, 'learning_rate': 0.00017462427745664738, 'epoch': 2.75} [WARNING|modeling_utils.py:388] 2022-03-23 21:02:04,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:02:04,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:02:04,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:02:14,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:02:14,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.325, 'learning_rate': 0.0001744508670520231, 'epoch': 2.75} [WARNING|modeling_utils.py:388] 2022-03-23 21:02:14,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:02:14,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:02:22,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:02:22,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:02:22,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2803, 'learning_rate': 0.00017427745664739882, 'epoch': 2.76} [WARNING|modeling_utils.py:388] 2022-03-23 21:02:28,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:02:28,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:02:28,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:02:34,541 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:02:34,541 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2399, 'learning_rate': 0.00017410404624277457, 'epoch': 2.76} [WARNING|modeling_bart.py:1051] 2022-03-23 21:02:38,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:02:38,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:02:42,786 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:02:42,786 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:02:45,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:02:45,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:02:49,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:02:49,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:02:53,102 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:02:53,102 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:02:55,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:02:55,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:02:59,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:01,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:01,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:03,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:05,809 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:07,812 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:09,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:09,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:11,804 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:13,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:15,517 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:17,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:17,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:19,193 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:20,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:24,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:24,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:25,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:27,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:28,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:28,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:32,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:33,439 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:35,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:35,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:37,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:40,316 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:41,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:41,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:43,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:45,699 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:45,699 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:48,561 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:50,308 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:50,308 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:52,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:52,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3402, 'learning_rate': 0.00017202312138728324, 'epoch': 2.78} [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:55,694 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:55,694 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:03:59,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:02,774 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:02,774 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:02,774 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:06,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:06,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:09,825 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:13,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:13,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:16,659 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:16,659 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:16,659 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:20,067 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:23,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:23,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:27,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:27,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:30,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:33,849 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:33,849 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4598, 'learning_rate': 0.00017150289017341038, 'epoch': 2.79} [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:37,395 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:37,395 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:40,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:40,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:44,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:47,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:47,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4237, 'learning_rate': 0.00017132947976878613, 'epoch': 2.79} [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:47,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:47,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:47,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:47,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:47,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:47,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:47,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4052, 'learning_rate': 0.00017115606936416185, 'epoch': 2.8} [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:47,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:47,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:47,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:04:47,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3596, 'learning_rate': 0.00017098265895953757, 'epoch': 2.8} Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3375, 'learning_rate': 0.00017080924855491326, 'epoch': 2.8} Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3509, 'learning_rate': 0.000170635838150289, 'epoch': 2.8} Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3337, 'learning_rate': 0.00017046242774566473, 'epoch': 2.8} Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4023, 'learning_rate': 0.00017028901734104045, 'epoch': 2.81} 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2961, 'learning_rate': 0.00017011560693641617, 'epoch': 2.81} 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2642, 'learning_rate': 0.00016994219653179192, 'epoch': 2.81} 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3523, 'learning_rate': 0.0001697687861271676, 'epoch': 2.81} 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████ | 1252/2230 [4:06:52<3:31:48, 12.99s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████▏ | 1256/2230 [4:07:43<3:27:25, 12.78s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████▏ | 1256/2230 [4:07:43<3:27:25, 12.78s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2547, 'learning_rate': 0.00016959537572254333, 'epoch': 2.82} 56%|██████████████████████████████████████████▏ | 1256/2230 [4:07:43<3:27:25, 12.78s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████▏ | 1256/2230 [4:07:43<3:27:25, 12.78s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████▏ | 1256/2230 [4:07:43<3:27:25, 12.78s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████▏ | 1256/2230 [4:07:43<3:27:25, 12.78s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████▏ | 1256/2230 [4:07:43<3:27:25, 12.78s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████▏ | 1256/2230 [4:07:43<3:27:25, 12.78s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2283, 'learning_rate': 0.00016942196531791905, 'epoch': 2.82} 56%|██████████████████████████████████████████▏ | 1256/2230 [4:07:43<3:27:25, 12.78s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████▏ | 1256/2230 [4:07:43<3:27:25, 12.78s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████▏ | 1256/2230 [4:07:43<3:27:25, 12.78s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████▏ | 1256/2230 [4:07:43<3:27:25, 12.78s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████▎ | 1258/2230 [4:08:08<3:25:14, 12.67s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████▎ | 1258/2230 [4:08:08<3:25:14, 12.67s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2924, 'learning_rate': 0.0001692485549132948, 'epoch': 2.82} 56%|██████████████████████████████████████████▎ | 1258/2230 [4:08:08<3:25:14, 12.67s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████▎ | 1258/2230 [4:08:08<3:25:14, 12.67s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████▎ | 1258/2230 [4:08:08<3:25:14, 12.67s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████▎ | 1258/2230 [4:08:08<3:25:14, 12.67s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████▎ | 1259/2230 [4:08:20<3:23:53, 12.60s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████▎ | 1259/2230 [4:08:20<3:23:53, 12.60s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3059, 'learning_rate': 0.00016907514450867052, 'epoch': 2.82} 56%|██████████████████████████████████████████▎ | 1259/2230 [4:08:20<3:23:53, 12.60s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████▎ | 1259/2230 [4:08:20<3:23:53, 12.60s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████▎ | 1259/2230 [4:08:20<3:23:53, 12.60s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 56%|██████████████████████████████████████████▎ | 1259/2230 [4:08:20<3:23:53, 12.60s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2438, 'learning_rate': 0.00016890173410404622, 'epoch': 2.83} Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2312, 'learning_rate': 0.00016872832369942194, 'epoch': 2.83} Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▍ | 1262/2230 [4:08:57<3:19:41, 12.38s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▍ | 1262/2230 [4:08:57<3:19:41, 12.38s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2383, 'learning_rate': 0.00016855491329479768, 'epoch': 2.83} 57%|██████████████████████████████████████████▍ | 1262/2230 [4:08:57<3:19:41, 12.38s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▍ | 1262/2230 [4:08:57<3:19:41, 12.38s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▍ | 1262/2230 [4:08:57<3:19:41, 12.38s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▍ | 1262/2230 [4:08:57<3:19:41, 12.38s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▍ | 1262/2230 [4:08:57<3:19:41, 12.38s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▍ | 1262/2230 [4:08:57<3:19:41, 12.38s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▍ | 1262/2230 [4:08:57<3:19:41, 12.38s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2653, 'learning_rate': 0.0001683815028901734, 'epoch': 2.83} 57%|██████████████████████████████████████████▍ | 1262/2230 [4:08:57<3:19:41, 12.38s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▍ | 1262/2230 [4:08:57<3:19:41, 12.38s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▍ | 1262/2230 [4:08:57<3:19:41, 12.38s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▍ | 1262/2230 [4:08:57<3:19:41, 12.38s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▍ | 1262/2230 [4:08:57<3:19:41, 12.38s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2651, 'learning_rate': 0.00016820809248554913, 'epoch': 2.83} 57%|██████████████████████████████████████████▍ | 1262/2230 [4:08:57<3:19:41, 12.38s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▍ | 1262/2230 [4:08:57<3:19:41, 12.38s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▍ | 1262/2230 [4:08:57<3:19:41, 12.38s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▍ | 1262/2230 [4:08:57<3:19:41, 12.38s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▌ | 1265/2230 [4:09:34<3:16:21, 12.21s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▌ | 1265/2230 [4:09:34<3:16:21, 12.21s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3351, 'learning_rate': 0.00016803468208092482, 'epoch': 2.84} 57%|██████████████████████████████████████████▌ | 1265/2230 [4:09:34<3:16:21, 12.21s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▌ | 1265/2230 [4:09:34<3:16:21, 12.21s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▌ | 1265/2230 [4:09:34<3:16:21, 12.21s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▌ | 1265/2230 [4:09:34<3:16:21, 12.21s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▌ | 1265/2230 [4:09:34<3:16:21, 12.21s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2531, 'learning_rate': 0.00016786127167630057, 'epoch': 2.84} 57%|██████████████████████████████████████████▌ | 1265/2230 [4:09:34<3:16:21, 12.21s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▌ | 1265/2230 [4:09:34<3:16:21, 12.21s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▌ | 1265/2230 [4:09:34<3:16:21, 12.21s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▌ | 1265/2230 [4:09:34<3:16:21, 12.21s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▌ | 1265/2230 [4:09:34<3:16:21, 12.21s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▌ | 1265/2230 [4:09:34<3:16:21, 12.21s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3217, 'learning_rate': 0.0001676878612716763, 'epoch': 2.84} 57%|██████████████████████████████████████████▌ | 1265/2230 [4:09:34<3:16:21, 12.21s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▌ | 1265/2230 [4:09:34<3:16:21, 12.21s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▌ | 1265/2230 [4:09:34<3:16:21, 12.21s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▌ | 1265/2230 [4:09:34<3:16:21, 12.21s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1268/2230 [4:10:09<3:10:57, 11.91s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1268/2230 [4:10:09<3:10:57, 11.91s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2335, 'learning_rate': 0.000167514450867052, 'epoch': 2.84} 57%|██████████████████████████████████████████▋ | 1268/2230 [4:10:09<3:10:57, 11.91s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1268/2230 [4:10:09<3:10:57, 11.91s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1268/2230 [4:10:09<3:10:57, 11.91s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1268/2230 [4:10:09<3:10:57, 11.91s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1268/2230 [4:10:09<3:10:57, 11.91s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1269/2230 [4:10:21<3:09:27, 11.83s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1269/2230 [4:10:21<3:09:27, 11.83s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1269/2230 [4:10:21<3:09:27, 11.83s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1269/2230 [4:10:21<3:09:27, 11.83s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1269/2230 [4:10:21<3:09:27, 11.83s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1269/2230 [4:10:21<3:09:27, 11.83s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1269/2230 [4:10:21<3:09:27, 11.83s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2112, 'learning_rate': 0.00016716763005780348, 'epoch': 2.85} 57%|██████████████████████████████████████████▋ | 1269/2230 [4:10:21<3:09:27, 11.83s/it] Setting `use_cache=False`...e computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:09:53,897 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:09:53,897 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:09:53,897 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1271/2230 [4:10:44<3:05:31, 11.61s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1271/2230 [4:10:44<3:05:31, 11.61s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2423, 'learning_rate': 0.0001669942196531792, 'epoch': 2.85} 57%|██████████████████████████████████████████▋ | 1271/2230 [4:10:44<3:05:31, 11.61s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1271/2230 [4:10:44<3:05:31, 11.61s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1271/2230 [4:10:44<3:05:31, 11.61s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1271/2230 [4:10:44<3:05:31, 11.61s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1271/2230 [4:10:44<3:05:31, 11.61s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2758, 'learning_rate': 0.0001668208092485549, 'epoch': 2.85} 57%|██████████████████████████████████████████▋ | 1271/2230 [4:10:44<3:05:31, 11.61s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1271/2230 [4:10:44<3:05:31, 11.61s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1271/2230 [4:10:44<3:05:31, 11.61s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1271/2230 [4:10:44<3:05:31, 11.61s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1271/2230 [4:10:44<3:05:31, 11.61s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▋ | 1271/2230 [4:10:44<3:05:31, 11.61s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:10:24,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:10:24,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:10:28,897 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:10:28,897 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▊ | 1274/2230 [4:11:17<2:59:51, 11.29s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▊ | 1274/2230 [4:11:17<2:59:51, 11.29s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2411, 'learning_rate': 0.00016647398843930633, 'epoch': 2.86} 57%|██████████████████████████████████████████▊ | 1274/2230 [4:11:17<2:59:51, 11.29s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▊ | 1274/2230 [4:11:17<2:59:51, 11.29s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▊ | 1274/2230 [4:11:17<2:59:51, 11.29s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▊ | 1274/2230 [4:11:17<2:59:51, 11.29s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▊ | 1274/2230 [4:11:17<2:59:51, 11.29s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▊ | 1274/2230 [4:11:17<2:59:51, 11.29s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2418, 'learning_rate': 0.00016630057803468208, 'epoch': 2.86} [WARNING|modeling_utils.py:388] 2022-03-23 21:10:49,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:10:49,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:10:49,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▉ | 1276/2230 [4:11:39<2:55:50, 11.06s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▉ | 1276/2230 [4:11:39<2:55:50, 11.06s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:10:57,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:10:57,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:10:57,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:10:57,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▉ | 1277/2230 [4:11:49<2:52:16, 10.85s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|██████████████████████████████████████████▉ | 1277/2230 [4:11:49<2:52:16, 10.85s/it]g-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:11:07,740 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:11:07,740 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:11:07,740 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:11:14,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:11:14,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:11:14,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2374, 'learning_rate': 0.00016578034682080922, 'epoch': 2.87} [WARNING|modeling_utils.py:388] 2022-03-23 21:11:14,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:11:21,524 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:11:21,524 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:11:21,524 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 20:53:46,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|███████████████████████████████████████████ | 1279/2230 [4:12:09<2:45:10, 10.42s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:11:26,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 57%|███████████████████████████████████████████ | 1279/2230 [4:12:09<2:45:10, 10.42s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:11:26,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:11:30,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:11:26,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:11:30,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:11:26,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:11:30,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:11:26,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:11:30,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:11:26,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:11:36,078 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:11:26,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:11:36,078 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:11:26,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:11:40,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:11:26,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:11:40,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:11:26,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:11:40,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:11:26,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:11:44,174 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:11:26,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:11:46,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:11:26,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:11:46,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:11:26,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:11:50,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:11:26,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:11:52,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:11:26,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:11:52,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:11:26,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.301, 'learning_rate': 0.0001650867052023121, 'epoch': 2.87} [WARNING|modeling_utils.py:388] 2022-03-23 21:11:56,479 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:11:26,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:11:58,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:11:26,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:11:58,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:11:26,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:11:58,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:11:26,078 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▏ | 1283/2230 [4:12:46<2:26:25, 9.28s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:12:02,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:04,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:02,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:06,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:02,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:08,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:02,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:08,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:02,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▏ | 1284/2230 [4:12:53<2:18:55, 8.81s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:12:09,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:11,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:09,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:13,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:09,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:13,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:09,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▏ | 1285/2230 [4:13:01<2:11:25, 8.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:12:09,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:18,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:17,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:20,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:17,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:22,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:17,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:22,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:17,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▎ | 1286/2230 [4:13:07<2:03:02, 7.82s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:12:23,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:23,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:28,243 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:23,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:28,243 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:23,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:31,169 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:29,802 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:32,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:29,802 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▎ | 1288/2230 [4:13:19<1:47:24, 6.84s/it] Setting `use_cache=False`...1] 2022-03-23 21:12:29,802 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▎ | 1288/2230 [4:13:19<1:47:24, 6.84s/it] Setting `use_cache=False`...1] 2022-03-23 21:12:29,802 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:36,752 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:35,538 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:39,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:35,538 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:39,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:35,538 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:41,279 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:40,218 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:43,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:40,218 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:43,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:40,218 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:45,249 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:44,354 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:46,974 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:44,354 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:46,974 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:44,354 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:49,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:47,894 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▍ | 1292/2230 [4:13:35<1:09:11, 4.43s/it] Setting `use_cache=False`...1] 2022-03-23 21:12:47,894 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▍ | 1292/2230 [4:13:35<1:09:11, 4.43s/it] Setting `use_cache=False`...1] 2022-03-23 21:12:47,894 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▍ | 1292/2230 [4:13:35<1:09:11, 4.43s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:12:51,906 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:55,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:51,906 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:55,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:51,906 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:59,030 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:51,906 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:12:59,030 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:51,906 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:02,528 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:51,906 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:02,528 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:12:51,906 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▍ | 1293/2230 [4:13:49<1:55:10, 7.38s/it] Setting `use_cache=False`...1] 2022-03-23 21:12:51,906 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▍ | 1293/2230 [4:13:49<1:55:10, 7.38s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:13:06,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:09,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:06,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:09,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:06,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:12,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:06,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:12,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:06,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:16,413 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:06,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:16,413 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:06,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▌ | 1294/2230 [4:14:03<2:25:07, 9.30s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:13:19,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▌ | 1294/2230 [4:14:03<2:25:07, 9.30s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:13:19,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:23,279 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:19,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:23,279 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:19,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:26,675 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:19,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:30,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:19,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▌ | 1295/2230 [4:14:16<2:45:13, 10.60s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:19,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▌ | 1295/2230 [4:14:16<2:45:13, 10.60s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:19,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▌ | 1295/2230 [4:14:16<2:45:13, 10.60s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:13:33,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▌ | 1295/2230 [4:14:16<2:45:13, 10.60s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:13:33,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:36,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:33,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:40,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:33,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:40,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:33,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:43,603 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:33,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▌ | 1296/2230 [4:14:30<2:58:51, 11.49s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:33,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▌ | 1296/2230 [4:14:30<2:58:51, 11.49s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:33,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▌ | 1296/2230 [4:14:30<2:58:51, 11.49s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3135, 'learning_rate': 0.00016248554913294796, 'epoch': 2.91} [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3618, 'learning_rate': 0.00016231213872832368, 'epoch': 2.91} [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3442, 'learning_rate': 0.00016213872832369943, 'epoch': 2.91} [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3581, 'learning_rate': 0.00016196531791907512, 'epoch': 2.91} [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3073, 'learning_rate': 0.00016179190751445085, 'epoch': 2.92} [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:13:50,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3005, 'learning_rate': 0.00016161849710982657, 'epoch': 2.92} 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2618, 'learning_rate': 0.00016144508670520231, 'epoch': 2.92} 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3217, 'learning_rate': 0.00016127167630057803, 'epoch': 2.92} 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2993, 'learning_rate': 0.00016109826589595373, 'epoch': 2.93} 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2586, 'learning_rate': 0.00016092485549132945, 'epoch': 2.93} 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 58%|███████████████████████████████████████████▊ | 1302/2230 [4:15:50<3:21:40, 13.04s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|███████████████████████████████████████████▉ | 1307/2230 [4:16:53<3:15:37, 12.72s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|███████████████████████████████████████████▉ | 1307/2230 [4:16:53<3:15:37, 12.72s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2481, 'learning_rate': 0.0001607514450867052, 'epoch': 2.93} 59%|███████████████████████████████████████████▉ | 1307/2230 [4:16:53<3:15:37, 12.72s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|███████████████████████████████████████████▉ | 1307/2230 [4:16:53<3:15:37, 12.72s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|███████████████████████████████████████████▉ | 1307/2230 [4:16:53<3:15:37, 12.72s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|███████████████████████████████████████████▉ | 1307/2230 [4:16:53<3:15:37, 12.72s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|███████████████████████████████████████████▉ | 1308/2230 [4:17:06<3:14:10, 12.64s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|███████████████████████████████████████████▉ | 1308/2230 [4:17:06<3:14:10, 12.64s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2872, 'learning_rate': 0.00016057803468208092, 'epoch': 2.93} 59%|███████████████████████████████████████████▉ | 1308/2230 [4:17:06<3:14:10, 12.64s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|███████████████████████████████████████████▉ | 1308/2230 [4:17:06<3:14:10, 12.64s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|███████████████████████████████████████████▉ | 1308/2230 [4:17:06<3:14:10, 12.64s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|███████████████████████████████████████████▉ | 1308/2230 [4:17:06<3:14:10, 12.64s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████ | 1309/2230 [4:17:18<3:12:48, 12.56s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████ | 1309/2230 [4:17:18<3:12:48, 12.56s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2772, 'learning_rate': 0.00016040462427745664, 'epoch': 2.93} 59%|████████████████████████████████████████████ | 1309/2230 [4:17:18<3:12:48, 12.56s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████ | 1309/2230 [4:17:18<3:12:48, 12.56s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████ | 1309/2230 [4:17:18<3:12:48, 12.56s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████ | 1309/2230 [4:17:18<3:12:48, 12.56s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████ | 1310/2230 [4:17:30<3:11:24, 12.48s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████ | 1310/2230 [4:17:30<3:11:24, 12.48s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2466, 'learning_rate': 0.00016023121387283233, 'epoch': 2.94} 59%|████████████████████████████████████████████ | 1310/2230 [4:17:30<3:11:24, 12.48s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████ | 1310/2230 [4:17:30<3:11:24, 12.48s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████ | 1310/2230 [4:17:30<3:11:24, 12.48s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████ | 1310/2230 [4:17:30<3:11:24, 12.48s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████ | 1311/2230 [4:17:43<3:09:54, 12.40s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████ | 1311/2230 [4:17:43<3:09:54, 12.40s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3004, 'learning_rate': 0.00016005780346820808, 'epoch': 2.94} 59%|████████████████████████████████████████████ | 1311/2230 [4:17:43<3:09:54, 12.40s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████ | 1311/2230 [4:17:43<3:09:54, 12.40s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████ | 1311/2230 [4:17:43<3:09:54, 12.40s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████ | 1311/2230 [4:17:43<3:09:54, 12.40s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1312/2230 [4:17:55<3:08:29, 12.32s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1312/2230 [4:17:55<3:08:29, 12.32s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1622, 'learning_rate': 0.0001598843930635838, 'epoch': 2.94} 59%|████████████████████████████████████████████▏ | 1312/2230 [4:17:55<3:08:29, 12.32s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1312/2230 [4:17:55<3:08:29, 12.32s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1312/2230 [4:17:55<3:08:29, 12.32s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1312/2230 [4:17:55<3:08:29, 12.32s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1313/2230 [4:18:07<3:08:38, 12.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1313/2230 [4:18:07<3:08:38, 12.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2447, 'learning_rate': 0.00015971098265895952, 'epoch': 2.94} 59%|████████████████████████████████████████████▏ | 1313/2230 [4:18:07<3:08:38, 12.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1313/2230 [4:18:07<3:08:38, 12.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1313/2230 [4:18:07<3:08:38, 12.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1313/2230 [4:18:07<3:08:38, 12.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1313/2230 [4:18:07<3:08:38, 12.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1313/2230 [4:18:07<3:08:38, 12.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2302, 'learning_rate': 0.00015953757225433524, 'epoch': 2.95} 59%|████████████████████████████████████████████▏ | 1313/2230 [4:18:07<3:08:38, 12.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1313/2230 [4:18:07<3:08:38, 12.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1313/2230 [4:18:07<3:08:38, 12.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1313/2230 [4:18:07<3:08:38, 12.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1313/2230 [4:18:07<3:08:38, 12.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1313/2230 [4:18:07<3:08:38, 12.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2657, 'learning_rate': 0.000159364161849711, 'epoch': 2.95} 59%|████████████████████████████████████████████▏ | 1313/2230 [4:18:07<3:08:38, 12.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1313/2230 [4:18:07<3:08:38, 12.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1313/2230 [4:18:07<3:08:38, 12.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1313/2230 [4:18:07<3:08:38, 12.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1313/2230 [4:18:07<3:08:38, 12.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.267, 'learning_rate': 0.00015919075144508668, 'epoch': 2.95} 59%|████████████████████████████████████████████▏ | 1313/2230 [4:18:07<3:08:38, 12.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1313/2230 [4:18:07<3:08:38, 12.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1313/2230 [4:18:07<3:08:38, 12.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▏ | 1313/2230 [4:18:07<3:08:38, 12.34s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▎ | 1317/2230 [4:18:55<3:01:12, 11.91s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▎ | 1317/2230 [4:18:55<3:01:12, 11.91s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.243, 'learning_rate': 0.0001590173410404624, 'epoch': 2.95} 59%|████████████████████████████████████████████▎ | 1317/2230 [4:18:55<3:01:12, 11.91s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▎ | 1317/2230 [4:18:55<3:01:12, 11.91s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▎ | 1317/2230 [4:18:55<3:01:12, 11.91s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▎ | 1317/2230 [4:18:55<3:01:12, 11.91s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▎ | 1317/2230 [4:18:55<3:01:12, 11.91s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▎ | 1317/2230 [4:18:55<3:01:12, 11.91s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2366, 'learning_rate': 0.00015884393063583812, 'epoch': 2.96} 59%|████████████████████████████████████████████▎ | 1317/2230 [4:18:55<3:01:12, 11.91s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▎ | 1317/2230 [4:18:55<3:01:12, 11.91s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▎ | 1317/2230 [4:18:55<3:01:12, 11.91s/it] Setting `use_cache=False`...1] 2022-03-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:18:33,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:18:33,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2866, 'learning_rate': 0.00015867052023121387, 'epoch': 2.96} [WARNING|modeling_utils.py:388] 2022-03-23 21:18:33,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:18:33,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:18:33,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:18:33,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:18:33,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2798, 'learning_rate': 0.0001584971098265896, 'epoch': 2.96} [WARNING|modeling_utils.py:388] 2022-03-23 21:18:33,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:18:33,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:18:33,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:18:33,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▍ | 1321/2230 [4:19:40<2:53:34, 11.46s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▍ | 1321/2230 [4:19:40<2:53:34, 11.46s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3047, 'learning_rate': 0.0001583236994219653, 'epoch': 2.96} 59%|████████████████████████████████████████████▍ | 1321/2230 [4:19:40<2:53:34, 11.46s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▍ | 1321/2230 [4:19:40<2:53:34, 11.46s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▍ | 1321/2230 [4:19:40<2:53:34, 11.46s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▍ | 1321/2230 [4:19:40<2:53:34, 11.46s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 59%|████████████████████████████████████████████▍ | 1321/2230 [4:19:40<2:53:34, 11.46s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2196, 'learning_rate': 0.000158150289017341, 'epoch': 2.96} 59%|████████████████████████████████████████████▍ | 1321/2230 [4:19:40<2:53:34, 11.46s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:19:12,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:19:12,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:19:12,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:19:12,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:19:12,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2324, 'learning_rate': 0.00015797687861271675, 'epoch': 2.97} [WARNING|modeling_utils.py:388] 2022-03-23 21:19:12,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:19:12,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:19:12,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:19:28,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:19:28,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2934, 'learning_rate': 0.00015780346820809248, 'epoch': 2.97} [WARNING|modeling_utils.py:388] 2022-03-23 21:19:28,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:19:35,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:19:35,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:19:35,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:19:35,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2647, 'learning_rate': 0.0001576300578034682, 'epoch': 2.97} [WARNING|modeling_utils.py:388] 2022-03-23 21:19:43,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:19:43,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:19:43,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:19:43,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:19:43,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.245, 'learning_rate': 0.00015745664739884392, 'epoch': 2.97} [WARNING|modeling_utils.py:388] 2022-03-23 21:19:53,401 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:19:53,401 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:19:53,401 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:19:59,516 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:19:59,516 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2012, 'learning_rate': 0.00015728323699421966, 'epoch': 2.98} [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:03,974 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:03,974 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:20:07,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:20:07,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:20:07,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:12,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:12,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:20:16,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:20:18,252 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:20:18,252 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.229, 'learning_rate': 0.00015693641618497108, 'epoch': 2.98} [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:22,308 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:24,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:26,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:26,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:28,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:30,673 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:32,616 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:32,616 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:34,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:36,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:38,273 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:40,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:40,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:41,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:43,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:46,882 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:46,882 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:48,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:50,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:53,006 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:53,006 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:54,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:57,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:59,351 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:20:59,351 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:00,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:02,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:02,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:04,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:07,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:07,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:09,128 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:09,128 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:10,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:10,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:14,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:14,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:17,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:21,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:21,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:24,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:24,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3332, 'learning_rate': 0.0001552023121387283, 'epoch': 3.0} [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:28,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:31,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:31,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:35,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:35,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:35,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:38,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:42,106 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:42,106 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:45,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:45,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:48,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:52,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:52,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.3005, 'learning_rate': 0.00015485549132947975, 'epoch': 3.01} [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:55,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:55,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:21:59,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1913, 'learning_rate': 0.00015468208092485547, 'epoch': 3.01} [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2485, 'learning_rate': 0.00015450867052023122, 'epoch': 3.01} [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2041, 'learning_rate': 0.00015433526011560692, 'epoch': 3.01} [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2291, 'learning_rate': 0.00015416184971098264, 'epoch': 3.02} [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:22:02,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1707, 'learning_rate': 0.0001538150289017341, 'epoch': 3.02} [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2488, 'learning_rate': 0.00015364161849710983, 'epoch': 3.02} [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.167, 'learning_rate': 0.00015346820809248555, 'epoch': 3.02} [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:23:02,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▍ | 1350/2230 [4:24:35<3:10:19, 12.98s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▍ | 1350/2230 [4:24:35<3:10:19, 12.98s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1798, 'learning_rate': 0.00015329479768786124, 'epoch': 3.03} 61%|█████████████████████████████████████████████▍ | 1350/2230 [4:24:35<3:10:19, 12.98s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▍ | 1350/2230 [4:24:35<3:10:19, 12.98s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1798, 'learning_rate': 0.000153121387283237, 'epoch': 3.03} [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1232, 'learning_rate': 0.0001529479768786127, 'epoch': 3.03} [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.171, 'learning_rate': 0.00015277456647398843, 'epoch': 3.03} [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2154, 'learning_rate': 0.00015260115606936415, 'epoch': 3.04} [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1341, 'learning_rate': 0.0001524277456647399, 'epoch': 3.04} [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1676, 'learning_rate': 0.0001522543352601156, 'epoch': 3.04} [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1611, 'learning_rate': 0.0001520809248554913, 'epoch': 3.04} [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1838, 'learning_rate': 0.00015190751445086703, 'epoch': 3.04} [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.164, 'learning_rate': 0.00015173410404624278, 'epoch': 3.05} [WARNING|modeling_bart.py:1051] 2022-03-23 21:24:00,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:25:49,104 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:25:49,104 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:25:49,104 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▋ | 1360/2230 [4:26:39<2:57:22, 12.23s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▋ | 1360/2230 [4:26:39<2:57:22, 12.23s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1697, 'learning_rate': 0.0001515606936416185, 'epoch': 3.05} 61%|█████████████████████████████████████████████▋ | 1360/2230 [4:26:39<2:57:22, 12.23s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▋ | 1360/2230 [4:26:39<2:57:22, 12.23s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▋ | 1360/2230 [4:26:39<2:57:22, 12.23s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▋ | 1360/2230 [4:26:39<2:57:22, 12.23s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▊ | 1361/2230 [4:26:51<2:56:12, 12.17s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▊ | 1361/2230 [4:26:51<2:56:12, 12.17s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1793, 'learning_rate': 0.0001513872832369942, 'epoch': 3.05} 61%|█████████████████████████████████████████████▊ | 1361/2230 [4:26:51<2:56:12, 12.17s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▊ | 1361/2230 [4:26:51<2:56:12, 12.17s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:26:16,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:26:16,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:26:16,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:26:16,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1421, 'learning_rate': 0.00015121387283236992, 'epoch': 3.05} [WARNING|modeling_utils.py:388] 2022-03-23 21:26:16,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:26:16,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:26:16,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:26:16,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:26:16,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:26:16,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1488, 'learning_rate': 0.00015104046242774566, 'epoch': 3.06} [WARNING|modeling_utils.py:388] 2022-03-23 21:26:16,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:26:16,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:26:16,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▊ | 1364/2230 [4:27:27<2:53:30, 12.02s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▊ | 1364/2230 [4:27:27<2:53:30, 12.02s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1655, 'learning_rate': 0.00015086705202312138, 'epoch': 3.06} 61%|█████████████████████████████████████████████▊ | 1364/2230 [4:27:27<2:53:30, 12.02s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▊ | 1364/2230 [4:27:27<2:53:30, 12.02s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▊ | 1364/2230 [4:27:27<2:53:30, 12.02s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▊ | 1364/2230 [4:27:27<2:53:30, 12.02s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▉ | 1365/2230 [4:27:39<2:51:32, 11.90s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▉ | 1365/2230 [4:27:39<2:51:32, 11.90s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2056, 'learning_rate': 0.0001506936416184971, 'epoch': 3.06} 61%|█████████████████████████████████████████████▉ | 1365/2230 [4:27:39<2:51:32, 11.90s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▉ | 1365/2230 [4:27:39<2:51:32, 11.90s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▉ | 1365/2230 [4:27:39<2:51:32, 11.90s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▉ | 1365/2230 [4:27:39<2:51:32, 11.90s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▉ | 1365/2230 [4:27:39<2:51:32, 11.90s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.121, 'learning_rate': 0.0001505202312138728, 'epoch': 3.06} 61%|█████████████████████████████████████████████▉ | 1365/2230 [4:27:39<2:51:32, 11.90s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▉ | 1365/2230 [4:27:39<2:51:32, 11.90s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▉ | 1365/2230 [4:27:39<2:51:32, 11.90s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▉ | 1365/2230 [4:27:39<2:51:32, 11.90s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▉ | 1367/2230 [4:28:02<2:47:30, 11.65s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▉ | 1367/2230 [4:28:02<2:47:30, 11.65s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1514, 'learning_rate': 0.00015034682080924855, 'epoch': 3.07} 61%|█████████████████████████████████████████████▉ | 1367/2230 [4:28:02<2:47:30, 11.65s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▉ | 1367/2230 [4:28:02<2:47:30, 11.65s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▉ | 1367/2230 [4:28:02<2:47:30, 11.65s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▉ | 1367/2230 [4:28:02<2:47:30, 11.65s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▉ | 1367/2230 [4:28:02<2:47:30, 11.65s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1511, 'learning_rate': 0.00015017341040462427, 'epoch': 3.07} 61%|█████████████████████████████████████████████▉ | 1367/2230 [4:28:02<2:47:30, 11.65s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▉ | 1367/2230 [4:28:02<2:47:30, 11.65s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▉ | 1367/2230 [4:28:02<2:47:30, 11.65s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 61%|█████████████████████████████████████████████▉ | 1367/2230 [4:28:02<2:47:30, 11.65s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1363, 'learning_rate': 0.00015, 'epoch': 3.07} g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1644, 'learning_rate': 0.0001498265895953757, 'epoch': 3.07} g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1106, 'learning_rate': 0.00014965317919075143, 'epoch': 3.07} g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:28:08,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:28:08,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▏ | 1372/2230 [4:28:57<2:37:39, 11.03s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▏ | 1372/2230 [4:28:57<2:37:39, 11.03s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1259, 'learning_rate': 0.00014947976878612715, 'epoch': 3.08} 62%|██████████████████████████████████████████████▏ | 1372/2230 [4:28:57<2:37:39, 11.03s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:28:19,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:28:19,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▏ | 1373/2230 [4:29:07<2:34:50, 10.84s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▏ | 1373/2230 [4:29:07<2:34:50, 10.84s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1348, 'learning_rate': 0.00014930635838150287, 'epoch': 3.08} 62%|██████████████████████████████████████████████▏ | 1373/2230 [4:29:07<2:34:50, 10.84s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:28:29,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:28:29,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▏ | 1374/2230 [4:29:17<2:32:11, 10.67s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▏ | 1374/2230 [4:29:17<2:32:11, 10.67s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1144, 'learning_rate': 0.0001491329479768786, 'epoch': 3.08} 62%|██████████████████████████████████████████████▏ | 1374/2230 [4:29:17<2:32:11, 10.67s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:28:39,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:28:39,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1318, 'learning_rate': 0.0001489595375722543, 'epoch': 3.08} g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:28:49,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:28:49,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▎ | 1376/2230 [4:29:38<2:27:28, 10.36s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▎ | 1376/2230 [4:29:38<2:27:28, 10.36s/it]g-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:28:55,982 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:28:58,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:28:58,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:02,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:02,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:02,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:29:06,248 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:29:08,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:29:10,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:29:10,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:29:10,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:14,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:16,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:18,620 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:18,620 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:13:47,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▍ | 1379/2230 [4:30:04<2:11:47, 9.29s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:29:20,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:22,789 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:29:20,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:24,767 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:29:20,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:26,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:29:20,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:26,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:29:20,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▍ | 1380/2230 [4:30:12<2:05:51, 8.88s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:29:28,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:30,518 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:29:28,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:32,339 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:29:28,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:34,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:29:28,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:34,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:29:28,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▍ | 1381/2230 [4:30:19<1:59:19, 8.43s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:29:36,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:39,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:29:36,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:41,087 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:29:36,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:41,087 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:29:36,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▍ | 1382/2230 [4:30:26<1:52:23, 7.95s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:29:42,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:44,335 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:29:42,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:47,333 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:29:42,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:47,333 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:29:42,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▌ | 1383/2230 [4:30:32<1:44:44, 7.42s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:29:48,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:51,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:29:48,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:52,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:29:48,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:52,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:29:48,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:55,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:29:54,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:56,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:29:54,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:29:56,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:29:54,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▌ | 1385/2230 [4:30:43<1:28:00, 6.25s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:29:59,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:01,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:29:59,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:01,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:29:59,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▌ | 1386/2230 [4:30:47<1:19:39, 5.66s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:30:03,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:05,242 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:03,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:05,242 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:03,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:07,831 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:07,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:09,676 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:07,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:09,676 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:07,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▋ | 1388/2230 [4:30:54<1:04:27, 4.59s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:30:11,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▋ | 1388/2230 [4:30:54<1:04:27, 4.59s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:30:11,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:15,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:11,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:15,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:11,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:18,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:11,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:18,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:11,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:22,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:11,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:22,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:11,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▋ | 1389/2230 [4:31:09<1:45:13, 7.51s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:30:25,693 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▋ | 1389/2230 [4:31:09<1:45:13, 7.51s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:30:25,693 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:29,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:25,693 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:29,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:25,693 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:32,538 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:25,693 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:35,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:25,693 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▋ | 1390/2230 [4:31:22<2:11:42, 9.41s/it] Setting `use_cache=False`...1] 2022-03-23 21:30:25,693 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▋ | 1390/2230 [4:31:22<2:11:42, 9.41s/it] Setting `use_cache=False`...1] 2022-03-23 21:30:25,693 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▋ | 1390/2230 [4:31:22<2:11:42, 9.41s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:30:39,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▋ | 1390/2230 [4:31:22<2:11:42, 9.41s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:30:39,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:42,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:39,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:46,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:39,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:46,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:39,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:49,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:39,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1391/2230 [4:31:36<2:28:52, 10.65s/it] Setting `use_cache=False`...1] 2022-03-23 21:30:39,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1391/2230 [4:31:36<2:28:52, 10.65s/it] Setting `use_cache=False`...1] 2022-03-23 21:30:39,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1391/2230 [4:31:36<2:28:52, 10.65s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:30:53,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:56,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:53,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:56,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:53,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:30:59,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:53,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:31:03,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:53,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:31:03,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:53,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:31:03,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:30:53,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2271, 'learning_rate': 0.00014583815028901734, 'epoch': 3.12} 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2333, 'learning_rate': 0.00014566473988439306, 'epoch': 3.13} 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2391, 'learning_rate': 0.00014549132947976878, 'epoch': 3.13} 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 62%|██████████████████████████████████████████████▊ | 1392/2230 [4:31:49<2:40:21, 11.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|██████████████████████████████████████████████▉ | 1396/2230 [4:32:42<2:57:50, 12.79s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|██████████████████████████████████████████████▉ | 1396/2230 [4:32:42<2:57:50, 12.79s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.563, 'learning_rate': 0.0001453179190751445, 'epoch': 3.13} 63%|██████████████████████████████████████████████▉ | 1396/2230 [4:32:42<2:57:50, 12.79s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|██████████████████████████████████████████████▉ | 1396/2230 [4:32:42<2:57:50, 12.79s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|██████████████████████████████████████████████▉ | 1396/2230 [4:32:42<2:57:50, 12.79s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|██████████████████████████████████████████████▉ | 1396/2230 [4:32:42<2:57:50, 12.79s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|██████████████████████████████████████████████▉ | 1396/2230 [4:32:42<2:57:50, 12.79s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|██████████████████████████████████████████████▉ | 1396/2230 [4:32:42<2:57:50, 12.79s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|██████████████████████████████████████████████▉ | 1396/2230 [4:32:42<2:57:50, 12.79s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1767, 'learning_rate': 0.00014514450867052022, 'epoch': 3.13} 63%|██████████████████████████████████████████████▉ | 1396/2230 [4:32:42<2:57:50, 12.79s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|██████████████████████████████████████████████▉ | 1396/2230 [4:32:42<2:57:50, 12.79s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|██████████████████████████████████████████████▉ | 1396/2230 [4:32:42<2:57:50, 12.79s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|██████████████████████████████████████████████▉ | 1396/2230 [4:32:42<2:57:50, 12.79s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|██████████████████████████████████████████████▉ | 1396/2230 [4:32:42<2:57:50, 12.79s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████ | 1398/2230 [4:33:08<2:59:01, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████ | 1398/2230 [4:33:08<2:59:01, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████ | 1398/2230 [4:33:08<2:59:01, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████ | 1398/2230 [4:33:08<2:59:01, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████ | 1398/2230 [4:33:08<2:59:01, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████ | 1398/2230 [4:33:08<2:59:01, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████ | 1398/2230 [4:33:08<2:59:01, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████ | 1398/2230 [4:33:08<2:59:01, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1731, 'learning_rate': 0.00014479768786127166, 'epoch': 3.14} 63%|███████████████████████████████████████████████ | 1398/2230 [4:33:08<2:59:01, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████ | 1398/2230 [4:33:08<2:59:01, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████ | 1398/2230 [4:33:08<2:59:01, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████ | 1398/2230 [4:33:08<2:59:01, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████ | 1398/2230 [4:33:08<2:59:01, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████ | 1398/2230 [4:33:08<2:59:01, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2047, 'learning_rate': 0.00014462427745664738, 'epoch': 3.14} 63%|███████████████████████████████████████████████ | 1398/2230 [4:33:08<2:59:01, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████ | 1398/2230 [4:33:08<2:59:01, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████ | 1398/2230 [4:33:08<2:59:01, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████ | 1398/2230 [4:33:08<2:59:01, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████ | 1398/2230 [4:33:08<2:59:01, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████ | 1401/2230 [4:33:47<2:58:25, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████ | 1401/2230 [4:33:47<2:58:25, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1805, 'learning_rate': 0.0001444508670520231, 'epoch': 3.14} 63%|███████████████████████████████████████████████ | 1401/2230 [4:33:47<2:58:25, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████ | 1401/2230 [4:33:47<2:58:25, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████ | 1401/2230 [4:33:47<2:58:25, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████ | 1401/2230 [4:33:47<2:58:25, 12.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▏ | 1402/2230 [4:34:00<2:57:39, 12.87s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▏ | 1402/2230 [4:34:00<2:57:39, 12.87s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1929, 'learning_rate': 0.00014427745664739882, 'epoch': 3.14} 63%|███████████████████████████████████████████████▏ | 1402/2230 [4:34:00<2:57:39, 12.87s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▏ | 1402/2230 [4:34:00<2:57:39, 12.87s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▏ | 1402/2230 [4:34:00<2:57:39, 12.87s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▏ | 1402/2230 [4:34:00<2:57:39, 12.87s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▏ | 1402/2230 [4:34:00<2:57:39, 12.87s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▏ | 1402/2230 [4:34:00<2:57:39, 12.87s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1438, 'learning_rate': 0.00014410404624277454, 'epoch': 3.15} [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1483, 'learning_rate': 0.00014393063583815026, 'epoch': 3.15} [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1981, 'learning_rate': 0.000143757225433526, 'epoch': 3.15} [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1836, 'learning_rate': 0.0001435838150289017, 'epoch': 3.15} [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1852, 'learning_rate': 0.00014341040462427745, 'epoch': 3.15} [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1698, 'learning_rate': 0.00014323699421965317, 'epoch': 3.16} [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1216, 'learning_rate': 0.0001430635838150289, 'epoch': 3.16} [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1676, 'learning_rate': 0.00014289017341040462, 'epoch': 3.16} [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:33:32,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▍ | 1411/2230 [4:35:51<2:46:22, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▍ | 1411/2230 [4:35:51<2:46:22, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▍ | 1411/2230 [4:35:51<2:46:22, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1809, 'learning_rate': 0.00014271676300578034, 'epoch': 3.16} 63%|███████████████████████████████████████████████▍ | 1411/2230 [4:35:51<2:46:22, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▍ | 1411/2230 [4:35:51<2:46:22, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▍ | 1411/2230 [4:35:51<2:46:22, 12.19s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▍ | 1412/2230 [4:36:03<2:44:55, 12.10s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▍ | 1412/2230 [4:36:03<2:44:55, 12.10s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1413, 'learning_rate': 0.00014254335260115606, 'epoch': 3.17} 63%|███████████████████████████████████████████████▍ | 1412/2230 [4:36:03<2:44:55, 12.10s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▍ | 1412/2230 [4:36:03<2:44:55, 12.10s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▍ | 1412/2230 [4:36:03<2:44:55, 12.10s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▍ | 1412/2230 [4:36:03<2:44:55, 12.10s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▌ | 1413/2230 [4:36:15<2:45:03, 12.12s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▌ | 1413/2230 [4:36:15<2:45:03, 12.12s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.137, 'learning_rate': 0.00014236994219653178, 'epoch': 3.17} 63%|███████████████████████████████████████████████▌ | 1413/2230 [4:36:15<2:45:03, 12.12s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▌ | 1413/2230 [4:36:15<2:45:03, 12.12s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▌ | 1413/2230 [4:36:15<2:45:03, 12.12s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▌ | 1413/2230 [4:36:15<2:45:03, 12.12s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▌ | 1413/2230 [4:36:15<2:45:03, 12.12s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1761, 'learning_rate': 0.0001421965317919075, 'epoch': 3.17} 63%|███████████████████████████████████████████████▌ | 1413/2230 [4:36:15<2:45:03, 12.12s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▌ | 1413/2230 [4:36:15<2:45:03, 12.12s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▌ | 1413/2230 [4:36:15<2:45:03, 12.12s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▌ | 1413/2230 [4:36:15<2:45:03, 12.12s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▌ | 1413/2230 [4:36:15<2:45:03, 12.12s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▌ | 1413/2230 [4:36:15<2:45:03, 12.12s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.126, 'learning_rate': 0.00014202312138728322, 'epoch': 3.17} 63%|███████████████████████████████████████████████▌ | 1413/2230 [4:36:15<2:45:03, 12.12s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▌ | 1413/2230 [4:36:15<2:45:03, 12.12s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▌ | 1413/2230 [4:36:15<2:45:03, 12.12s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▌ | 1413/2230 [4:36:15<2:45:03, 12.12s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▌ | 1416/2230 [4:36:50<2:40:01, 11.80s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▌ | 1416/2230 [4:36:50<2:40:01, 11.80s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1518, 'learning_rate': 0.00014184971098265894, 'epoch': 3.17} 63%|███████████████████████████████████████████████▌ | 1416/2230 [4:36:50<2:40:01, 11.80s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▌ | 1416/2230 [4:36:50<2:40:01, 11.80s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▌ | 1416/2230 [4:36:50<2:40:01, 11.80s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▌ | 1416/2230 [4:36:50<2:40:01, 11.80s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▌ | 1416/2230 [4:36:50<2:40:01, 11.80s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1303, 'learning_rate': 0.00014167630057803466, 'epoch': 3.18} 63%|███████████████████████████████████████████████▌ | 1416/2230 [4:36:50<2:40:01, 11.80s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▌ | 1416/2230 [4:36:50<2:40:01, 11.80s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▌ | 1416/2230 [4:36:50<2:40:01, 11.80s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 63%|███████████████████████████████████████████████▌ | 1416/2230 [4:36:50<2:40:01, 11.80s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 64%|███████████████████████████████████████████████▋ | 1418/2230 [4:37:12<2:36:34, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 64%|███████████████████████████████████████████████▋ | 1418/2230 [4:37:12<2:36:34, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1535, 'learning_rate': 0.00014150289017341038, 'epoch': 3.18} 64%|███████████████████████████████████████████████▋ | 1418/2230 [4:37:12<2:36:34, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 64%|███████████████████████████████████████████████▋ | 1418/2230 [4:37:12<2:36:34, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 64%|███████████████████████████████████████████████▋ | 1418/2230 [4:37:12<2:36:34, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 64%|███████████████████████████████████████████████▋ | 1418/2230 [4:37:12<2:36:34, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 64%|███████████████████████████████████████████████▋ | 1418/2230 [4:37:12<2:36:34, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1286, 'learning_rate': 0.00014132947976878613, 'epoch': 3.18} 64%|███████████████████████████████████████████████▋ | 1418/2230 [4:37:12<2:36:34, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:36:45,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:36:45,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:36:45,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:36:45,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:36:45,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:36:53,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:36:53,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:36:53,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:36:53,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 64%|███████████████████████████████████████████████▊ | 1421/2230 [4:37:45<2:30:43, 11.18s/it]g-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 64%|███████████████████████████████████████████████▊ | 1421/2230 [4:37:45<2:30:43, 11.18s/it]g-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1791, 'learning_rate': 0.00014098265895953757, 'epoch': 3.19} 64%|███████████████████████████████████████████████▊ | 1421/2230 [4:37:45<2:30:43, 11.18s/it]g-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 64%|███████████████████████████████████████████████▊ | 1421/2230 [4:37:45<2:30:43, 11.18s/it]g-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 64%|███████████████████████████████████████████████▊ | 1421/2230 [4:37:45<2:30:43, 11.18s/it]g-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 64%|███████████████████████████████████████████████▊ | 1421/2230 [4:37:45<2:30:43, 11.18s/it]g-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 64%|███████████████████████████████████████████████▊ | 1421/2230 [4:37:45<2:30:43, 11.18s/it]g-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:37:13,511 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:37:13,511 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:37:13,511 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:37:13,511 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:37:13,511 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:37:23,941 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:37:23,941 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.5027, 'learning_rate': 0.000140635838150289, 'epoch': 3.19} [WARNING|modeling_utils.py:388] 2022-03-23 21:37:23,941 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:37:23,941 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:37:23,941 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:37:23,941 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:37:23,941 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:37:36,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:37:36,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:37:36,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:37:36,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:37:36,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:37:36,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1555, 'learning_rate': 0.00014028901734104045, 'epoch': 3.2} [WARNING|modeling_utils.py:388] 2022-03-23 21:37:48,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:37:48,412 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:37:52,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:37:52,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:37:52,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:37:56,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:37:56,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:37:56,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:37:56,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:38:02,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:38:04,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:38:04,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:38:09,081 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:38:11,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:38:11,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1183, 'learning_rate': 0.00013976878612716762, 'epoch': 3.2} [WARNING|modeling_utils.py:388] 2022-03-23 21:38:15,051 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:38:17,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:38:19,313 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:38:19,313 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1121, 'learning_rate': 0.00013959537572254334, 'epoch': 3.2} [WARNING|modeling_bart.py:1051] 2022-03-23 21:38:23,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:38:25,160 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:38:27,118 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:38:27,118 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:31:06,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 64%|████████████████████████████████████████████████ | 1430/2230 [4:39:13<2:00:33, 9.04s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:38:29,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:38:31,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:38:29,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:38:32,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:38:29,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:38:34,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:38:29,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:38:34,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:38:29,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 64%|████████████████████████████████████████████████▏ | 1431/2230 [4:39:20<1:54:03, 8.57s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:38:36,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:38:39,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:38:36,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:38:41,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:38:36,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:38:41,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:38:36,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 64%|████████████████████████████████████████████████▏ | 1432/2230 [4:39:27<1:46:45, 8.03s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:38:43,243 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:38:44,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:38:43,243 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:38:46,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:38:43,243 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:38:46,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:38:43,243 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 64%|████████████████████████████████████████████████▏ | 1433/2230 [4:39:33<1:39:16, 7.47s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:38:49,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:38:50,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:38:49,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:38:53,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:38:49,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:38:53,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:38:49,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 64%|████████████████████████████████████████████████▏ | 1434/2230 [4:39:38<1:31:21, 6.89s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:38:54,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:38:57,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:38:54,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:38:57,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:38:54,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 64%|████████████████████████████████████████████████▎ | 1435/2230 [4:39:43<1:23:07, 6.27s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:38:59,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:39:01,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:38:59,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 64%|████████████████████████████████████████████████▎ | 1436/2230 [4:39:48<1:15:13, 5.68s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:39:03,879 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 64%|████████████████████████████████████████████████▎ | 1436/2230 [4:39:48<1:15:13, 5.68s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:39:03,879 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:39:05,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:03,879 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:39:05,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:03,879 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 64%|████████████████████████████████████████████████▎ | 1437/2230 [4:39:51<1:07:10, 5.08s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:39:07,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:39:09,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:07,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:39:09,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:07,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 64%|████████████████████████████████████████████████▎ | 1438/2230 [4:39:55<1:00:49, 4.61s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:39:12,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 64%|████████████████████████████████████████████████▎ | 1438/2230 [4:39:55<1:00:49, 4.61s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:39:12,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:39:15,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:12,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:39:15,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:12,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:39:19,114 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:12,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:39:19,114 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:12,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:39:22,647 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:12,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:39:22,647 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:12,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▍ | 1439/2230 [4:40:09<1:39:17, 7.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:39:26,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▍ | 1439/2230 [4:40:09<1:39:17, 7.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:39:26,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:39:29,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:26,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:39:29,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:26,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:39:33,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:26,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:39:33,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:26,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:39:36,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:26,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:39:36,639 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:26,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▍ | 1440/2230 [4:40:23<2:04:30, 9.46s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:39:40,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▍ | 1440/2230 [4:40:23<2:04:30, 9.46s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:39:40,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:39:43,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:40,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:39:43,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:40,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:39:46,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:40,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:39:50,345 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:40,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▍ | 1441/2230 [4:40:37<2:20:56, 10.72s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:40,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▍ | 1441/2230 [4:40:37<2:20:56, 10.72s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:40,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▍ | 1441/2230 [4:40:37<2:20:56, 10.72s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:39:57,182 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:39:57,182 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:40:00,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:40:00,510 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:40:03,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▍ | 1442/2230 [4:40:50<2:31:32, 11.54s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▍ | 1442/2230 [4:40:50<2:31:32, 11.54s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2625, 'learning_rate': 0.00013734104046242773, 'epoch': 3.23} 65%|████████████████████████████████████████████████▍ | 1442/2230 [4:40:50<2:31:32, 11.54s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▍ | 1442/2230 [4:40:50<2:31:32, 11.54s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▍ | 1442/2230 [4:40:50<2:31:32, 11.54s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▍ | 1442/2230 [4:40:50<2:31:32, 11.54s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1443/2230 [4:41:04<2:38:25, 12.08s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1443/2230 [4:41:04<2:38:25, 12.08s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1896, 'learning_rate': 0.00013716763005780345, 'epoch': 3.24} 65%|████████████████████████████████████████████████▌ | 1443/2230 [4:41:04<2:38:25, 12.08s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1443/2230 [4:41:04<2:38:25, 12.08s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1443/2230 [4:41:04<2:38:25, 12.08s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1443/2230 [4:41:04<2:38:25, 12.08s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1443/2230 [4:41:04<2:38:25, 12.08s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1443/2230 [4:41:04<2:38:25, 12.08s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1443/2230 [4:41:04<2:38:25, 12.08s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1894, 'learning_rate': 0.00013699421965317917, 'epoch': 3.24} 65%|████████████████████████████████████████████████▌ | 1443/2230 [4:41:04<2:38:25, 12.08s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1443/2230 [4:41:04<2:38:25, 12.08s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1443/2230 [4:41:04<2:38:25, 12.08s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1443/2230 [4:41:04<2:38:25, 12.08s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1994, 'learning_rate': 0.00013682080924855492, 'epoch': 3.24} 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1791, 'learning_rate': 0.00013664739884393061, 'epoch': 3.24} 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.213, 'learning_rate': 0.00013647398843930636, 'epoch': 3.24} 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1923, 'learning_rate': 0.00013630057803468206, 'epoch': 3.25} 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1412, 'learning_rate': 0.0001361271676300578, 'epoch': 3.25} 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1965, 'learning_rate': 0.00013595375722543352, 'epoch': 3.25} 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▌ | 1445/2230 [4:41:30<2:46:02, 12.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▊ | 1451/2230 [4:42:48<2:47:44, 12.92s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▊ | 1451/2230 [4:42:48<2:47:44, 12.92s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1976, 'learning_rate': 0.00013578034682080925, 'epoch': 3.25} 65%|████████████████████████████████████████████████▊ | 1451/2230 [4:42:48<2:47:44, 12.92s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▊ | 1451/2230 [4:42:48<2:47:44, 12.92s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▊ | 1451/2230 [4:42:48<2:47:44, 12.92s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▊ | 1451/2230 [4:42:48<2:47:44, 12.92s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▊ | 1451/2230 [4:42:48<2:47:44, 12.92s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 65%|████████████████████████████████████████████████▊ | 1451/2230 [4:42:48<2:47:44, 12.92s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1655, 'learning_rate': 0.00013560693641618497, 'epoch': 3.26} 65%|████████████████████████████████████████████████▊ | 1451/2230 [4:42:48<2:47:44, 12.92s/it] Setting `use_cache=False`...1] 2022-03-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2195, 'learning_rate': 0.0001354335260115607, 'epoch': 3.26} [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1942, 'learning_rate': 0.0001352601156069364, 'epoch': 3.26} [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1557, 'learning_rate': 0.00013508670520231213, 'epoch': 3.26} [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.155, 'learning_rate': 0.00013491329479768785, 'epoch': 3.26} [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1443, 'learning_rate': 0.00013473988439306357, 'epoch': 3.27} [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1514, 'learning_rate': 0.0001345664739884393, 'epoch': 3.27} [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1367, 'learning_rate': 0.00013439306358381504, 'epoch': 3.27} [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1459, 'learning_rate': 0.00013421965317919073, 'epoch': 3.27} [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1873, 'learning_rate': 0.00013404624277456648, 'epoch': 3.28} [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:42:23,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:44:19,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:44:19,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1342, 'learning_rate': 0.00013387283236994217, 'epoch': 3.28} [WARNING|modeling_utils.py:388] 2022-03-23 21:44:19,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:44:19,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:44:19,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:44:19,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 66%|█████████████████████████████████████████████████▏ | 1463/2230 [4:45:16<2:34:53, 12.12s/it]g-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 66%|█████████████████████████████████████████████████▏ | 1463/2230 [4:45:16<2:34:53, 12.12s/it]g-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1421, 'learning_rate': 0.00013369942196531792, 'epoch': 3.28} 66%|█████████████████████████████████████████████████▏ | 1463/2230 [4:45:16<2:34:53, 12.12s/it]g-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 66%|█████████████████████████████████████████████████▏ | 1463/2230 [4:45:16<2:34:53, 12.12s/it]g-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 66%|█████████████████████████████████████████████████▏ | 1463/2230 [4:45:16<2:34:53, 12.12s/it]g-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:44:42,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:44:42,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1153, 'learning_rate': 0.00013352601156069364, 'epoch': 3.28} [WARNING|modeling_utils.py:388] 2022-03-23 21:44:42,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:44:42,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:44:42,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:44:42,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:44:42,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:44:42,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1168, 'learning_rate': 0.00013335260115606936, 'epoch': 3.28} [WARNING|modeling_utils.py:388] 2022-03-23 21:44:42,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:44:42,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:44:42,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:44:42,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:06,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:06,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1794, 'learning_rate': 0.00013317919075144508, 'epoch': 3.29} [WARNING|modeling_utils.py:388] 2022-03-23 21:45:06,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:06,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:06,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:06,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:06,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2179, 'learning_rate': 0.00013300578034682078, 'epoch': 3.29} [WARNING|modeling_utils.py:388] 2022-03-23 21:45:06,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:06,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:06,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:06,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:06,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:06,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1447, 'learning_rate': 0.00013283236994219652, 'epoch': 3.29} [WARNING|modeling_utils.py:388] 2022-03-23 21:45:06,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:06,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:06,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:39,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:39,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:39,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1838, 'learning_rate': 0.00013265895953757224, 'epoch': 3.29} [WARNING|modeling_utils.py:388] 2022-03-23 21:45:39,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:39,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:39,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:39,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:39,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1361, 'learning_rate': 0.00013248554913294797, 'epoch': 3.3} [WARNING|modeling_utils.py:388] 2022-03-23 21:45:39,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:39,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:39,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:39,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:39,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:39,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1267, 'learning_rate': 0.00013231213872832369, 'epoch': 3.3} [WARNING|modeling_utils.py:388] 2022-03-23 21:45:39,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:39,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:45:39,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 66%|█████████████████████████████████████████████████▌ | 1472/2230 [4:46:58<2:20:49, 11.15s/it]g-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 66%|█████████████████████████████████████████████████▌ | 1472/2230 [4:46:58<2:20:49, 11.15s/it]g-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1564, 'learning_rate': 0.0001321387283236994, 'epoch': 3.3} 66%|█████████████████████████████████████████████████▌ | 1472/2230 [4:46:58<2:20:49, 11.15s/it]g-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 66%|█████████████████████████████████████████████████▌ | 1472/2230 [4:46:58<2:20:49, 11.15s/it]g-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 66%|█████████████████████████████████████████████████▌ | 1472/2230 [4:46:58<2:20:49, 11.15s/it]g-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 66%|█████████████████████████████████████████████████▌ | 1472/2230 [4:46:58<2:20:49, 11.15s/it]g-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 66%|█████████████████████████████████████████████████▌ | 1472/2230 [4:46:58<2:20:49, 11.15s/it]g-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1736, 'learning_rate': 0.00013196531791907513, 'epoch': 3.3} [WARNING|modeling_utils.py:388] 2022-03-23 21:46:28,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:46:28,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:46:28,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:46:28,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:46:28,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1423, 'learning_rate': 0.00013179190751445085, 'epoch': 3.3} [WARNING|modeling_utils.py:388] 2022-03-23 21:46:38,768 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:46:38,768 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:46:38,768 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:46:38,768 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:46:38,768 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1301, 'learning_rate': 0.00013161849710982657, 'epoch': 3.31} [WARNING|modeling_utils.py:388] 2022-03-23 21:46:38,768 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:46:38,768 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:46:52,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:46:52,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:46:52,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1068, 'learning_rate': 0.0001314450867052023, 'epoch': 3.31} [WARNING|modeling_utils.py:388] 2022-03-23 21:46:58,958 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:46:58,958 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:47:03,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 66%|█████████████████████████████████████████████████▋ | 1477/2230 [4:47:49<2:07:33, 10.16s/it] Setting `use_cache=False`...e computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 66%|█████████████████████████████████████████████████▋ | 1477/2230 [4:47:49<2:07:33, 10.16s/it] Setting `use_cache=False`...e computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:47:07,167 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:47:09,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 21:47:09,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:47:13,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:47:13,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:47:15,842 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:47:15,842 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:47:15,842 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:47:21,220 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:39:53,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 66%|█████████████████████████████████████████████████▋ | 1479/2230 [4:48:07<1:59:12, 9.52s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:47:23,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 66%|█████████████████████████████████████████████████▋ | 1479/2230 [4:48:07<1:59:12, 9.52s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:47:23,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:47:25,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:47:23,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:47:27,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:47:23,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:47:29,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:47:23,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 66%|█████████████████████████████████████████████████▊ | 1480/2230 [4:48:15<1:53:34, 9.09s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:47:31,445 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 66%|█████████████████████████████████████████████████▊ | 1480/2230 [4:48:15<1:53:34, 9.09s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:47:31,445 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:47:33,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:47:31,445 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:47:35,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:47:31,445 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:47:36,991 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:47:31,445 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 66%|█████████████████████████████████████████████████▊ | 1481/2230 [4:48:22<1:47:21, 8.60s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:47:38,852 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 66%|█████████████████████████████████████████████████▊ | 1481/2230 [4:48:22<1:47:21, 8.60s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:47:38,852 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:47:40,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:47:38,852 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:47:43,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:47:38,852 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 66%|█████████████████████████████████████████████████▊ | 1482/2230 [4:48:29<1:40:35, 8.07s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:47:45,631 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 66%|█████████████████████████████████████████████████▊ | 1482/2230 [4:48:29<1:40:35, 8.07s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:47:45,631 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:47:47,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:47:45,631 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:47:48,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:47:45,631 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|█████████████████████████████████████████████████▉ | 1483/2230 [4:48:35<1:33:27, 7.51s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:47:51,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|█████████████████████████████████████████████████▉ | 1483/2230 [4:48:35<1:33:27, 7.51s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:47:51,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:47:53,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:47:51,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:47:55,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:47:51,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|█████████████████████████████████████████████████▉ | 1484/2230 [4:48:41<1:26:08, 6.93s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:47:57,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|█████████████████████████████████████████████████▉ | 1484/2230 [4:48:41<1:26:08, 6.93s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:47:57,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:47:59,752 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:47:57,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|█████████████████████████████████████████████████▉ | 1485/2230 [4:48:46<1:18:36, 6.33s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:48:02,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|█████████████████████████████████████████████████▉ | 1485/2230 [4:48:46<1:18:36, 6.33s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:48:02,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:48:03,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:02,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|█████████████████████████████████████████████████▉ | 1486/2230 [4:48:50<1:11:27, 5.76s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:02,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|█████████████████████████████████████████████████▉ | 1486/2230 [4:48:50<1:11:27, 5.76s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:02,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:48:07,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:06,563 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████ | 1487/2230 [4:48:54<1:03:53, 5.16s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:48:10,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████ | 1487/2230 [4:48:54<1:03:53, 5.16s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:48:10,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:48:12,298 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:10,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|███████████████████████████████████████████████████▍ | 1488/2230 [4:48:58<58:01, 4.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:10,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|███████████████████████████████████████████████████▍ | 1488/2230 [4:48:58<58:01, 4.69s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:10,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|███████████████████████████████████████████████████▍ | 1488/2230 [4:48:58<58:01, 4.69s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:48:14,858 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:48:18,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:14,858 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:48:18,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:14,858 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:48:21,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:14,858 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:48:21,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:14,858 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:48:25,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:14,858 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:48:25,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:14,858 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████ | 1489/2230 [4:49:12<1:33:41, 7.59s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:14,858 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████ | 1489/2230 [4:49:12<1:33:41, 7.59s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:48:29,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████ | 1489/2230 [4:49:12<1:33:41, 7.59s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:48:29,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:48:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:29,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:48:36,010 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:29,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:48:36,010 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:29,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:48:39,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:29,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:48:39,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:29,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████ | 1490/2230 [4:49:26<1:56:49, 9.47s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:29,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████ | 1490/2230 [4:49:26<1:56:49, 9.47s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:48:42,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:48:46,368 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:42,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:48:46,368 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:42,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:48:49,752 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:42,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:48:49,752 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:42,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:48:53,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:42,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:48:53,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:42,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1491/2230 [4:49:40<2:12:25, 10.75s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1491/2230 [4:49:40<2:12:25, 10.75s/it][WARNING|modeling_bart.py:1051] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:49:00,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:49:03,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:49:03,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 21:49:06,688 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2052, 'learning_rate': 0.00012867052023121387, 'epoch': 3.35} 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1749, 'learning_rate': 0.00012849710982658957, 'epoch': 3.35} 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1611, 'learning_rate': 0.00012832369942196532, 'epoch': 3.35} 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2097, 'learning_rate': 0.00012815028901734104, 'epoch': 3.35} 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1994, 'learning_rate': 0.00012797687861271676, 'epoch': 3.35} 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1656, 'learning_rate': 0.00012780346820809248, 'epoch': 3.36} 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▏ | 1492/2230 [4:49:53<2:22:22, 11.57s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▍ | 1498/2230 [4:51:13<2:38:53, 13.02s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▍ | 1498/2230 [4:51:13<2:38:53, 13.02s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1881, 'learning_rate': 0.0001276300578034682, 'epoch': 3.36} 67%|██████████████████████████████████████████████████▍ | 1498/2230 [4:51:13<2:38:53, 13.02s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▍ | 1498/2230 [4:51:13<2:38:53, 13.02s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▍ | 1498/2230 [4:51:13<2:38:53, 13.02s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▍ | 1498/2230 [4:51:13<2:38:53, 13.02s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▍ | 1498/2230 [4:51:13<2:38:53, 13.02s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▍ | 1498/2230 [4:51:13<2:38:53, 13.02s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▍ | 1498/2230 [4:51:13<2:38:53, 13.02s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.214, 'learning_rate': 0.00012745664739884392, 'epoch': 3.36} 67%|██████████████████████████████████████████████████▍ | 1498/2230 [4:51:13<2:38:53, 13.02s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▍ | 1498/2230 [4:51:13<2:38:53, 13.02s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▍ | 1498/2230 [4:51:13<2:38:53, 13.02s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 67%|██████████████████████████████████████████████████▍ | 1498/2230 [4:51:13<2:38:53, 13.02s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2364] 2022-03-23 21:50:55,136 >> ***** Running Evaluation *****| 1498/2230 [4:51:13<2:38:53, 13.02s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2364] 2022-03-23 21:50:55,136 >> ***** Running Evaluation *****| 1498/2230 [4:51:13<2:38:53, 13.02s/it] Setting `use_cache=False`...1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 03/23/2022 22:00:10 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow {'eval_loss': 0.34972885251045227, 'eval_wer': 0.1040063466878223, 'eval_runtime': 555.1742, 'eval_samples_per_second': 4.759, 'eval_steps_per_second': 0.596, 'epoch': 3.36} 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1352, 'learning_rate': 0.00012710982658959536, 'epoch': 3.37} 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1341, 'learning_rate': 0.00012693641618497108, 'epoch': 3.37} 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].1] 2022-03-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1614, 'learning_rate': 0.0001267630057803468, 'epoch': 3.37} [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1356, 'learning_rate': 0.00012658959537572252, 'epoch': 3.37} [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1486, 'learning_rate': 0.00012641618497109824, 'epoch': 3.37} [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1557, 'learning_rate': 0.000126242774566474, 'epoch': 3.38} [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1015, 'learning_rate': 0.00012606936416184968, 'epoch': 3.38} [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:02:19,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▋ | 1508/2230 [5:04:17<5:50:44, 29.15s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▋ | 1508/2230 [5:04:17<5:50:44, 29.15s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1544, 'learning_rate': 0.00012589595375722543, 'epoch': 3.38} 68%|██████████████████████████████████████████████████▋ | 1508/2230 [5:04:17<5:50:44, 29.15s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▋ | 1508/2230 [5:04:17<5:50:44, 29.15s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▋ | 1508/2230 [5:04:17<5:50:44, 29.15s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▋ | 1508/2230 [5:04:17<5:50:44, 29.15s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▋ | 1508/2230 [5:04:17<5:50:44, 29.15s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▋ | 1508/2230 [5:04:17<5:50:44, 29.15s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1357, 'learning_rate': 0.00012572254335260115, 'epoch': 3.38} 68%|██████████████████████████████████████████████████▋ | 1508/2230 [5:04:17<5:50:44, 29.15s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▋ | 1508/2230 [5:04:17<5:50:44, 29.15s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▋ | 1508/2230 [5:04:17<5:50:44, 29.15s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▋ | 1508/2230 [5:04:17<5:50:44, 29.15s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▋ | 1508/2230 [5:04:17<5:50:44, 29.15s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▋ | 1508/2230 [5:04:17<5:50:44, 29.15s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1212, 'learning_rate': 0.00012554913294797687, 'epoch': 3.39} 68%|██████████████████████████████████████████████████▋ | 1508/2230 [5:04:17<5:50:44, 29.15s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▋ | 1508/2230 [5:04:17<5:50:44, 29.15s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▋ | 1508/2230 [5:04:17<5:50:44, 29.15s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▋ | 1508/2230 [5:04:17<5:50:44, 29.15s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▊ | 1511/2230 [5:04:53<3:35:39, 18.00s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▊ | 1511/2230 [5:04:53<3:35:39, 18.00s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1688, 'learning_rate': 0.0001253757225433526, 'epoch': 3.39} 68%|██████████████████████████████████████████████████▊ | 1511/2230 [5:04:53<3:35:39, 18.00s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▊ | 1511/2230 [5:04:53<3:35:39, 18.00s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▊ | 1511/2230 [5:04:53<3:35:39, 18.00s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▊ | 1511/2230 [5:04:53<3:35:39, 18.00s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▊ | 1511/2230 [5:04:53<3:35:39, 18.00s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1234, 'learning_rate': 0.00012520231213872831, 'epoch': 3.39} 68%|██████████████████████████████████████████████████▊ | 1511/2230 [5:04:53<3:35:39, 18.00s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▊ | 1511/2230 [5:04:53<3:35:39, 18.00s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▊ | 1511/2230 [5:04:53<3:35:39, 18.00s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▊ | 1511/2230 [5:04:53<3:35:39, 18.00s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▊ | 1511/2230 [5:04:53<3:35:39, 18.00s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▊ | 1511/2230 [5:04:53<3:35:39, 18.00s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1307, 'learning_rate': 0.00012502890173410404, 'epoch': 3.39} 68%|██████████████████████████████████████████████████▊ | 1511/2230 [5:04:53<3:35:39, 18.00s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▊ | 1511/2230 [5:04:53<3:35:39, 18.00s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|██████████████████████████████████████████████████▊ | 1511/2230 [5:04:53<3:35:39, 18.00s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:04:42,371 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:04:42,371 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:04:42,371 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1111, 'learning_rate': 0.00012485549132947976, 'epoch': 3.39} [WARNING|modeling_utils.py:388] 2022-03-23 22:04:42,371 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:04:42,371 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:04:52,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:04:52,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:04:52,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:04:52,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.133, 'learning_rate': 0.00012468208092485548, 'epoch': 3.4} [WARNING|modeling_utils.py:388] 2022-03-23 22:04:52,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:04:52,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:04:52,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:04:52,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:04:52,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:04:52,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1339, 'learning_rate': 0.0001245086705202312, 'epoch': 3.4} [WARNING|modeling_utils.py:388] 2022-03-23 22:04:52,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:04:52,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:04:52,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:04:52,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:04:52,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:04:52,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1395, 'learning_rate': 0.00012433526011560692, 'epoch': 3.4} [WARNING|modeling_utils.py:388] 2022-03-23 22:04:52,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:04:52,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:04:52,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:05:31,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:05:31,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1324, 'learning_rate': 0.00012416184971098267, 'epoch': 3.4} [WARNING|modeling_utils.py:388] 2022-03-23 22:05:31,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:05:31,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:05:31,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:05:31,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:05:31,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0869, 'learning_rate': 0.00012398843930635836, 'epoch': 3.41} [WARNING|modeling_utils.py:388] 2022-03-23 22:05:31,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:05:31,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:05:31,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:05:31,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|███████████████████████████████████████████████████ | 1520/2230 [5:06:38<2:17:37, 11.63s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|███████████████████████████████████████████████████ | 1520/2230 [5:06:38<2:17:37, 11.63s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1488, 'learning_rate': 0.0001238150289017341, 'epoch': 3.41} 68%|███████████████████████████████████████████████████ | 1520/2230 [5:06:38<2:17:37, 11.63s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|███████████████████████████████████████████████████ | 1520/2230 [5:06:38<2:17:37, 11.63s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|███████████████████████████████████████████████████ | 1520/2230 [5:06:38<2:17:37, 11.63s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|███████████████████████████████████████████████████ | 1520/2230 [5:06:38<2:17:37, 11.63s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|███████████████████████████████████████████████████ | 1520/2230 [5:06:38<2:17:37, 11.63s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.129, 'learning_rate': 0.0001236416184971098, 'epoch': 3.41} 68%|███████████████████████████████████████████████████ | 1520/2230 [5:06:38<2:17:37, 11.63s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:06:10,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:06:10,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:06:10,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:06:10,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:06:10,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1255, 'learning_rate': 0.00012346820809248555, 'epoch': 3.41} [WARNING|modeling_utils.py:388] 2022-03-23 22:06:10,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:06:22,842 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:06:22,842 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:06:22,842 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:06:22,842 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1568, 'learning_rate': 0.00012329479768786127, 'epoch': 3.41} [WARNING|modeling_bart.py:1051] 2022-03-23 22:06:22,842 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:06:32,399 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:06:32,399 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|███████████████████████████████████████████████████▎ | 1524/2230 [5:07:20<2:07:02, 10.80s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|███████████████████████████████████████████████████▎ | 1524/2230 [5:07:20<2:07:02, 10.80s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1269, 'learning_rate': 0.000123121387283237, 'epoch': 3.42} 68%|███████████████████████████████████████████████████▎ | 1524/2230 [5:07:20<2:07:02, 10.80s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:06:42,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:06:42,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|███████████████████████████████████████████████████▎ | 1525/2230 [5:07:31<2:06:01, 10.73s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|███████████████████████████████████████████████████▎ | 1525/2230 [5:07:31<2:06:01, 10.73s/it]g-point operations will not be computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0831, 'learning_rate': 0.0001229479768786127, 'epoch': 3.42} [WARNING|modeling_bart.py:1051] 2022-03-23 22:06:51,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:06:51,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:06:51,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 21:48:56,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|███████████████████████████████████████████████████▎ | 1526/2230 [5:07:41<2:03:30, 10.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 68%|███████████████████████████████████████████████████▎ | 1526/2230 [5:07:41<2:03:30, 10.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1454, 'learning_rate': 0.00012277456647398843, 'epoch': 3.42} [WARNING|modeling_utils.py:388] 2022-03-23 22:07:01,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:07:01,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:07:01,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:07:01,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:07:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:07:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:07:11,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:07:14,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 69%|███████████████████████████████████████████████████▍ | 1528/2230 [5:08:00<1:56:28, 9.96s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 69%|███████████████████████████████████████████████████▍ | 1528/2230 [5:08:00<1:56:28, 9.96s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:07:18,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:07:20,252 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:07:22,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:07:24,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:07:24,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1523, 'learning_rate': 0.0001222543352601156, 'epoch': 3.43} [WARNING|modeling_bart.py:1051] 2022-03-23 22:07:28,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:07:30,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:07:32,438 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:07:32,438 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:07:34,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:07:36,333 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:07:38,176 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:07:39,997 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:07:39,997 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:07:41,879 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:07:43,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:07:47,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:07:47,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:07:48,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:07:50,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:07:51,921 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:07:51,921 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:07:54,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:07:56,459 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:07:59,176 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:07:59,176 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:00,563 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:02,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:05,244 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:05,244 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:07,287 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:09,293 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:09,293 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:11,080 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:11,080 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:13,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:15,488 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:15,488 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2097, 'learning_rate': 0.0001206936416184971, 'epoch': 3.45} [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:18,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:18,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:22,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:25,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:25,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:29,296 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:29,296 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.292, 'learning_rate': 0.00012052023121387281, 'epoch': 3.45} [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:32,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:36,339 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:36,339 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:39,814 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:39,814 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:43,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:43,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2704, 'learning_rate': 0.00012034682080924855, 'epoch': 3.45} [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:46,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:50,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:50,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:53,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:53,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:56,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:08:56,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:00,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:00,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:03,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:03,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:07,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:07,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2087, 'learning_rate': 0.00011982658959537571, 'epoch': 3.46} [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2058, 'learning_rate': 0.00011965317919075144, 'epoch': 3.46} [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.173, 'learning_rate': 0.00011947976878612715, 'epoch': 3.46} [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1584, 'learning_rate': 0.00011930635838150289, 'epoch': 3.47} [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2082, 'learning_rate': 0.0001191329479768786, 'epoch': 3.47} [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:09:10,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 69%|████████████████████████████████████████████████████ | 1548/2230 [5:11:15<2:27:24, 12.97s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 69%|████████████████████████████████████████████████████ | 1548/2230 [5:11:15<2:27:24, 12.97s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1624, 'learning_rate': 0.00011895953757225433, 'epoch': 3.47} 69%|████████████████████████████████████████████████████ | 1548/2230 [5:11:15<2:27:24, 12.97s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 69%|████████████████████████████████████████████████████ | 1548/2230 [5:11:15<2:27:24, 12.97s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 69%|████████████████████████████████████████████████████ | 1548/2230 [5:11:15<2:27:24, 12.97s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 69%|████████████████████████████████████████████████████ | 1548/2230 [5:11:15<2:27:24, 12.97s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 69%|████████████████████████████████████████████████████ | 1548/2230 [5:11:15<2:27:24, 12.97s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 69%|████████████████████████████████████████████████████ | 1548/2230 [5:11:15<2:27:24, 12.97s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1681, 'learning_rate': 0.00011878612716763005, 'epoch': 3.47} 69%|████████████████████████████████████████████████████ | 1548/2230 [5:11:15<2:27:24, 12.97s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 69%|████████████████████████████████████████████████████ | 1548/2230 [5:11:15<2:27:24, 12.97s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 69%|████████████████████████████████████████████████████ | 1548/2230 [5:11:15<2:27:24, 12.97s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 69%|████████████████████████████████████████████████████ | 1548/2230 [5:11:15<2:27:24, 12.97s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 69%|████████████████████████████████████████████████████ | 1548/2230 [5:11:15<2:27:24, 12.97s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 69%|████████████████████████████████████████████████████ | 1548/2230 [5:11:15<2:27:24, 12.97s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 69%|████████████████████████████████████████████████████ | 1548/2230 [5:11:15<2:27:24, 12.97s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1637, 'learning_rate': 0.00011861271676300578, 'epoch': 3.48} 69%|████████████████████████████████████████████████████ | 1548/2230 [5:11:15<2:27:24, 12.97s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 69%|████████████████████████████████████████████████████ | 1548/2230 [5:11:15<2:27:24, 12.97s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 69%|████████████████████████████████████████████████████ | 1548/2230 [5:11:15<2:27:24, 12.97s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 69%|████████████████████████████████████████████████████ | 1548/2230 [5:11:15<2:27:24, 12.97s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1455, 'learning_rate': 0.00011843930635838149, 'epoch': 3.48} Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1189, 'learning_rate': 0.00011826589595375722, 'epoch': 3.48} 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1575, 'learning_rate': 0.00011809248554913293, 'epoch': 3.48} 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1567, 'learning_rate': 0.00011791907514450866, 'epoch': 3.48} 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1484, 'learning_rate': 0.00011774566473988439, 'epoch': 3.49} 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1359, 'learning_rate': 0.00011757225433526012, 'epoch': 3.49} 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1224, 'learning_rate': 0.00011739884393063583, 'epoch': 3.49} 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1439, 'learning_rate': 0.00011722543352601156, 'epoch': 3.49} 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1238, 'learning_rate': 0.00011705202312138727, 'epoch': 3.5} 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▏ | 1552/2230 [5:12:06<2:25:04, 12.84s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▍ | 1560/2230 [5:13:44<2:16:37, 12.24s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▍ | 1560/2230 [5:13:44<2:16:37, 12.24s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▍ | 1560/2230 [5:13:44<2:16:37, 12.24s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▍ | 1560/2230 [5:13:44<2:16:37, 12.24s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▍ | 1560/2230 [5:13:44<2:16:37, 12.24s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▍ | 1560/2230 [5:13:44<2:16:37, 12.24s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▍ | 1560/2230 [5:13:44<2:16:37, 12.24s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1132, 'learning_rate': 0.00011653179190751443, 'epoch': 3.5} 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.136, 'learning_rate': 0.00011635838150289016, 'epoch': 3.5} 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.134, 'learning_rate': 0.00011618497109826587, 'epoch': 3.51} 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1325, 'learning_rate': 0.0001160115606936416, 'epoch': 3.51} 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1259, 'learning_rate': 0.00011583815028901733, 'epoch': 3.51} 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▌ | 1561/2230 [5:13:56<2:15:36, 12.16s/it] Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:14:22,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:14:22,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.16, 'learning_rate': 0.00011566473988439306, 'epoch': 3.51} [WARNING|modeling_utils.py:388] 2022-03-23 22:14:22,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:14:22,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:14:22,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:14:22,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:14:22,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:14:22,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1253, 'learning_rate': 0.00011549132947976877, 'epoch': 3.52} [WARNING|modeling_utils.py:388] 2022-03-23 22:14:22,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:14:22,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:14:22,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▊ | 1569/2230 [5:15:29<2:05:11, 11.36s/it]g-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 70%|████████████████████████████████████████████████████▊ | 1569/2230 [5:15:29<2:05:11, 11.36s/it]g-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1421, 'learning_rate': 0.0001153179190751445, 'epoch': 3.52} [WARNING|modeling_utils.py:388] 2022-03-23 22:14:49,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:14:49,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:14:49,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:14:49,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:14:49,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1387, 'learning_rate': 0.00011514450867052021, 'epoch': 3.52} [WARNING|modeling_utils.py:388] 2022-03-23 22:14:49,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:14:49,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:14:49,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:14:49,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:14:49,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:14:49,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:15:09,195 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:15:09,195 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:15:09,195 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:15:15,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:15:15,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:15:15,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:15:19,586 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:15:19,586 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:15:19,586 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:15:25,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:15:25,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:15:25,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.171, 'learning_rate': 0.00011462427745664738, 'epoch': 3.53} [WARNING|modeling_utils.py:388] 2022-03-23 22:15:25,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:15:34,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:15:34,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:15:34,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:15:34,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1136, 'learning_rate': 0.0001144508670520231, 'epoch': 3.53} [WARNING|modeling_utils.py:388] 2022-03-23 22:15:42,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:15:42,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:15:42,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:15:42,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:15:42,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:15:50,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:15:50,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:15:54,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:15:54,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:15:54,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:15:54,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:16:00,666 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:16:00,666 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:16:04,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:16:04,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:16:04,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:16:08,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:16:10,984 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:16:13,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:16:13,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:16:16,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:16:16,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:16:19,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:16:19,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:16:22,851 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:06:57,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████ | 1579/2230 [5:17:08<1:41:04, 9.32s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:16:25,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████ | 1579/2230 [5:17:08<1:41:04, 9.32s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:16:25,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:16:27,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:16:25,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:16:28,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:16:25,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:16:30,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:16:25,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▏ | 1580/2230 [5:17:16<1:36:29, 8.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:16:32,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▏ | 1580/2230 [5:17:16<1:36:29, 8.91s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:16:32,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:16:34,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:16:32,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:16:36,538 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:16:32,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▏ | 1581/2230 [5:17:24<1:31:17, 8.44s/it] Setting `use_cache=False`...1] 2022-03-23 22:16:32,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▏ | 1581/2230 [5:17:24<1:31:17, 8.44s/it] Setting `use_cache=False`...1] 2022-03-23 22:16:32,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▏ | 1581/2230 [5:17:24<1:31:17, 8.44s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:16:40,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:16:43,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:16:40,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:16:45,225 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:16:40,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:16:45,225 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:16:40,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▏ | 1582/2230 [5:17:30<1:25:45, 7.94s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:16:46,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:16:49,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:16:46,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:16:51,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:16:46,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:16:51,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:16:46,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▏ | 1583/2230 [5:17:36<1:19:28, 7.37s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:16:52,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:16:55,556 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:16:52,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▎ | 1584/2230 [5:17:42<1:12:49, 6.76s/it] Setting `use_cache=False`...1] 2022-03-23 22:16:52,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▎ | 1584/2230 [5:17:42<1:12:49, 6.76s/it] Setting `use_cache=False`...1] 2022-03-23 22:16:52,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:16:59,400 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:16:58,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:01,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:16:58,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:01,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:16:58,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▎ | 1585/2230 [5:17:47<1:06:19, 6.17s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:17:02,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:05,014 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:02,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:05,014 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:02,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|██████████████████████████████████████████████████████▊ | 1586/2230 [5:17:51<59:51, 5.58s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:17:07,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:09,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:07,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:09,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:07,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:11,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:10,693 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:13,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:10,693 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:13,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:10,693 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|██████████████████████████████████████████████████████▊ | 1588/2230 [5:17:58<48:46, 4.56s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:17:15,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|██████████████████████████████████████████████████████▊ | 1588/2230 [5:17:58<48:46, 4.56s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:17:15,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:18,748 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:15,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:18,748 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:15,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:22,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:15,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:25,770 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:15,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▍ | 1589/2230 [5:18:12<1:19:51, 7.48s/it] Setting `use_cache=False`...1] 2022-03-23 22:17:15,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▍ | 1589/2230 [5:18:12<1:19:51, 7.48s/it] Setting `use_cache=False`...1] 2022-03-23 22:17:15,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▍ | 1589/2230 [5:18:12<1:19:51, 7.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:17:29,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▍ | 1589/2230 [5:18:12<1:19:51, 7.48s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:17:29,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:32,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:29,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:36,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:29,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:36,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:29,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:39,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:29,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:39,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:29,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▍ | 1590/2230 [5:18:26<1:40:07, 9.39s/it] Setting `use_cache=False`...1] 2022-03-23 22:17:29,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▍ | 1590/2230 [5:18:26<1:40:07, 9.39s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:17:43,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:46,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:43,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:46,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:43,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:50,009 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:43,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:50,009 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:43,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:53,383 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:43,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:17:53,383 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:43,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1591/2230 [5:18:40<1:53:45, 10.68s/it] Setting `use_cache=False`...1] 2022-03-23 22:17:43,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1591/2230 [5:18:40<1:53:45, 10.68s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:17:56,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:18:00,246 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:56,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:18:00,246 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:56,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:18:03,579 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:56,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:18:06,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:56,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:18:06,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:56,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:18:06,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-23 22:17:56,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2053, 'learning_rate': 0.00011115606936416184, 'epoch': 3.57} 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.192, 'learning_rate': 0.00011098265895953756, 'epoch': 3.57} 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1852, 'learning_rate': 0.0001108092485549133, 'epoch': 3.58} 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1857, 'learning_rate': 0.000110635838150289, 'epoch': 3.58} 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1598, 'learning_rate': 0.00011046242774566474, 'epoch': 3.58} 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1605, 'learning_rate': 0.00011028901734104044, 'epoch': 3.58} 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1873, 'learning_rate': 0.00011011560693641618, 'epoch': 3.59} 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1568, 'learning_rate': 0.0001099421965317919, 'epoch': 3.59} 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1084, 'learning_rate': 0.00010976878612716762, 'epoch': 3.59} 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1516, 'learning_rate': 0.00010959537572254334, 'epoch': 3.59} 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1708, 'learning_rate': 0.00010942196531791907, 'epoch': 3.59} 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1408, 'learning_rate': 0.00010924855491329478, 'epoch': 3.6} 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1296, 'learning_rate': 0.00010907514450867051, 'epoch': 3.6} 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1297, 'learning_rate': 0.00010890173410404623, 'epoch': 3.6} 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1442, 'learning_rate': 0.00010872832369942196, 'epoch': 3.6} 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 71%|█████████████████████████████████████████████████████▌ | 1592/2230 [5:18:53<2:02:33, 11.53s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 72%|██████████████████████████████████████████████████████ | 1608/2230 [5:22:20<2:09:13, 12.47s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 72%|██████████████████████████████████████████████████████ | 1608/2230 [5:22:20<2:09:13, 12.47s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1224, 'learning_rate': 0.00010855491329479768, 'epoch': 3.61} 72%|██████████████████████████████████████████████████████ | 1608/2230 [5:22:20<2:09:13, 12.47s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 72%|██████████████████████████████████████████████████████ | 1608/2230 [5:22:20<2:09:13, 12.47s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 72%|██████████████████████████████████████████████████████ | 1608/2230 [5:22:20<2:09:13, 12.47s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 72%|██████████████████████████████████████████████████████ | 1608/2230 [5:22:20<2:09:13, 12.47s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 72%|██████████████████████████████████████████████████████ | 1609/2230 [5:22:32<2:08:34, 12.42s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 72%|██████████████████████████████████████████████████████ | 1609/2230 [5:22:32<2:08:34, 12.42s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1211, 'learning_rate': 0.00010838150289017341, 'epoch': 3.61} 72%|██████████████████████████████████████████████████████ | 1609/2230 [5:22:32<2:08:34, 12.42s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 72%|██████████████████████████████████████████████████████ | 1609/2230 [5:22:32<2:08:34, 12.42s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 72%|██████████████████████████████████████████████████████ | 1609/2230 [5:22:32<2:08:34, 12.42s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 72%|██████████████████████████████████████████████████████ | 1609/2230 [5:22:32<2:08:34, 12.42s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 72%|██████████████████████████████████████████████████████ | 1609/2230 [5:22:32<2:08:34, 12.42s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 72%|██████████████████████████████████████████████████████ | 1609/2230 [5:22:32<2:08:34, 12.42s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1337, 'learning_rate': 0.00010820809248554912, 'epoch': 3.61} 72%|██████████████████████████████████████████████████████ | 1609/2230 [5:22:32<2:08:34, 12.42s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 72%|██████████████████████████████████████████████████████ | 1609/2230 [5:22:32<2:08:34, 12.42s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 72%|██████████████████████████████████████████████████████ | 1609/2230 [5:22:32<2:08:34, 12.42s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 72%|██████████████████████████████████████████████████████ | 1609/2230 [5:22:32<2:08:34, 12.42s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 72%|██████████████████████████████████████████████████████ | 1609/2230 [5:22:32<2:08:34, 12.42s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 72%|██████████████████████████████████████████████████████ | 1609/2230 [5:22:32<2:08:34, 12.42s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1649, 'learning_rate': 0.00010803468208092485, 'epoch': 3.61} 72%|██████████████████████████████████████████████████████ | 1609/2230 [5:22:32<2:08:34, 12.42s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 72%|██████████████████████████████████████████████████████ | 1609/2230 [5:22:32<2:08:34, 12.42s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:22:20,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:22:20,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:22:20,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:22:20,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.128, 'learning_rate': 0.00010786127167630056, 'epoch': 3.61} [WARNING|modeling_utils.py:388] 2022-03-23 22:22:20,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:22:20,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:22:20,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:22:20,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:22:20,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:22:20,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1535, 'learning_rate': 0.00010768786127167629, 'epoch': 3.62} [WARNING|modeling_utils.py:388] 2022-03-23 22:22:20,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:22:20,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:22:20,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:22:20,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:22:20,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1381, 'learning_rate': 0.00010751445086705201, 'epoch': 3.62} [WARNING|modeling_utils.py:388] 2022-03-23 22:22:20,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:22:20,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:22:20,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:22:20,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 72%|██████████████████████████████████████████████████████▎ | 1615/2230 [5:23:43<2:01:53, 11.89s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 72%|██████████████████████████████████████████████████████▎ | 1615/2230 [5:23:43<2:01:53, 11.89s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1441, 'learning_rate': 0.00010734104046242773, 'epoch': 3.62} 72%|██████████████████████████████████████████████████████▎ | 1615/2230 [5:23:43<2:01:53, 11.89s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 72%|██████████████████████████████████████████████████████▎ | 1615/2230 [5:23:43<2:01:53, 11.89s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:23:07,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:23:07,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:23:07,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:23:07,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1232, 'learning_rate': 0.00010716763005780346, 'epoch': 3.62} [WARNING|modeling_utils.py:388] 2022-03-23 22:23:07,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:23:07,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:23:07,947 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▍ | 1617/2230 [5:24:06<1:59:08, 11.66s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▍ | 1617/2230 [5:24:06<1:59:08, 11.66s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1301, 'learning_rate': 0.00010699421965317919, 'epoch': 3.63} [WARNING|modeling_utils.py:388] 2022-03-23 22:23:26,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:23:26,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:23:30,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:23:30,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:23:30,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:23:30,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1528, 'learning_rate': 0.0001068208092485549, 'epoch': 3.63} [WARNING|modeling_utils.py:388] 2022-03-23 22:23:30,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:23:30,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:23:30,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▍ | 1619/2230 [5:24:29<1:56:27, 11.44s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▍ | 1619/2230 [5:24:29<1:56:27, 11.44s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1144, 'learning_rate': 0.00010664739884393063, 'epoch': 3.63} 73%|██████████████████████████████████████████████████████▍ | 1619/2230 [5:24:29<1:56:27, 11.44s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▍ | 1619/2230 [5:24:29<1:56:27, 11.44s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▍ | 1619/2230 [5:24:29<1:56:27, 11.44s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▍ | 1619/2230 [5:24:29<1:56:27, 11.44s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▍ | 1619/2230 [5:24:29<1:56:27, 11.44s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1606, 'learning_rate': 0.00010647398843930635, 'epoch': 3.63} 73%|██████████████████████████████████████████████████████▍ | 1619/2230 [5:24:29<1:56:27, 11.44s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▍ | 1619/2230 [5:24:29<1:56:27, 11.44s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▍ | 1619/2230 [5:24:29<1:56:27, 11.44s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▍ | 1619/2230 [5:24:29<1:56:27, 11.44s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▍ | 1619/2230 [5:24:29<1:56:27, 11.44s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4729, 'learning_rate': 0.00010630057803468207, 'epoch': 3.63} 73%|██████████████████████████████████████████████████████▍ | 1619/2230 [5:24:29<1:56:27, 11.44s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▍ | 1619/2230 [5:24:29<1:56:27, 11.44s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:24:13,104 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:24:13,104 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:24:13,104 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▌ | 1622/2230 [5:25:01<1:51:18, 10.98s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▌ | 1622/2230 [5:25:01<1:51:18, 10.98s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▌ | 1622/2230 [5:25:01<1:51:18, 10.98s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:24:23,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:24:23,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▌ | 1623/2230 [5:25:12<1:49:33, 10.83s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▌ | 1623/2230 [5:25:12<1:49:33, 10.83s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1248, 'learning_rate': 0.00010595375722543353, 'epoch': 3.64} [WARNING|modeling_bart.py:1051] 2022-03-23 22:24:32,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:24:32,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:24:32,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:24:32,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▌ | 1624/2230 [5:25:22<1:47:30, 10.64s/it] Setting `use_cache=False`...e computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:24:40,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:24:40,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:24:40,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:24:40,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▋ | 1625/2230 [5:25:32<1:46:19, 10.54s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▋ | 1625/2230 [5:25:32<1:46:19, 10.54s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1429, 'learning_rate': 0.00010560693641618497, 'epoch': 3.64} 73%|██████████████████████████████████████████████████████▋ | 1625/2230 [5:25:32<1:46:19, 10.54s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:24:54,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:24:54,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:24:54,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▋ | 1626/2230 [5:25:42<1:43:33, 10.29s/it]g-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:00,176 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:02,483 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:02,483 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:25:06,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:25:06,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.148, 'learning_rate': 0.00010526011560693641, 'epoch': 3.65} [WARNING|modeling_utils.py:388] 2022-03-23 22:25:10,449 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:12,661 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:12,661 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:18:10,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▊ | 1628/2230 [5:26:00<1:37:02, 9.67s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 73%|██████████████████████████████████████████████████████▊ | 1628/2230 [5:26:00<1:37:02, 9.67s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1268, 'learning_rate': 0.00010508670520231213, 'epoch': 3.65} [WARNING|modeling_utils.py:388] 2022-03-23 22:25:20,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:22,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:24,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:24,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:26,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:28,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:30,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:32,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:32,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:34,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:36,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:37,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:37,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:39,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:43,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:44,822 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:46,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:46,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:48,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:51,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:52,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:52,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:55,331 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:56,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:56,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:25:59,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:01,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:02,717 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:02,717 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:04,935 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:08,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:11,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:11,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:13,565 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:14,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:14,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2565, 'learning_rate': 0.00010335260115606935, 'epoch': 3.67} [WARNING|modeling_utils.py:388] 2022-03-23 22:26:17,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:21,523 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:21,523 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:25,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:25,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:25,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:28,523 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:28,523 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:32,066 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:35,508 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:35,508 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:38,927 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:38,927 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:38,927 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:42,341 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:45,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:45,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:49,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:49,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:52,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:56,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:56,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.202, 'learning_rate': 0.00010283236994219653, 'epoch': 3.68} [WARNING|modeling_utils.py:388] 2022-03-23 22:26:59,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:26:59,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:02,902 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:06,236 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:06,236 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2011, 'learning_rate': 0.00010265895953757225, 'epoch': 3.68} [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1397, 'learning_rate': 0.00010248554913294798, 'epoch': 3.68} [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1991, 'learning_rate': 0.00010231213872832369, 'epoch': 3.69} [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.143, 'learning_rate': 0.00010213872832369942, 'epoch': 3.69} [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:27:09,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▎ | 1646/2230 [5:28:47<2:05:22, 12.88s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▎ | 1646/2230 [5:28:47<2:05:22, 12.88s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.181, 'learning_rate': 0.00010196531791907513, 'epoch': 3.69} 74%|███████████████████████████████████████████████████████▎ | 1646/2230 [5:28:47<2:05:22, 12.88s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▎ | 1646/2230 [5:28:47<2:05:22, 12.88s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▎ | 1646/2230 [5:28:47<2:05:22, 12.88s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▎ | 1646/2230 [5:28:47<2:05:22, 12.88s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▎ | 1646/2230 [5:28:47<2:05:22, 12.88s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▎ | 1646/2230 [5:28:47<2:05:22, 12.88s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▎ | 1646/2230 [5:28:47<2:05:22, 12.88s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1497, 'learning_rate': 0.00010179190751445086, 'epoch': 3.69} 74%|███████████████████████████████████████████████████████▎ | 1646/2230 [5:28:47<2:05:22, 12.88s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▎ | 1646/2230 [5:28:47<2:05:22, 12.88s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▎ | 1646/2230 [5:28:47<2:05:22, 12.88s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▎ | 1646/2230 [5:28:47<2:05:22, 12.88s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▎ | 1646/2230 [5:28:47<2:05:22, 12.88s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▎ | 1646/2230 [5:28:47<2:05:22, 12.88s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.2158, 'learning_rate': 0.00010161849710982658, 'epoch': 3.7} 74%|███████████████████████████████████████████████████████▎ | 1646/2230 [5:28:47<2:05:22, 12.88s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▎ | 1646/2230 [5:28:47<2:05:22, 12.88s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▎ | 1646/2230 [5:28:47<2:05:22, 12.88s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▎ | 1646/2230 [5:28:47<2:05:22, 12.88s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1534, 'learning_rate': 0.0001014450867052023, 'epoch': 3.7} 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.16, 'learning_rate': 0.00010127167630057803, 'epoch': 3.7} 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1698, 'learning_rate': 0.00010109826589595376, 'epoch': 3.7} 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1082, 'learning_rate': 0.00010092485549132947, 'epoch': 3.7} 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 74%|███████████████████████████████████████████████████████▍ | 1649/2230 [5:29:26<2:05:33, 12.97s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1202, 'learning_rate': 0.0001007514450867052, 'epoch': 3.71} g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1296, 'learning_rate': 0.00010057803468208092, 'epoch': 3.71} g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1141, 'learning_rate': 0.00010040462427745664, 'epoch': 3.71} g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1072, 'learning_rate': 0.00010023121387283236, 'epoch': 3.71} g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1284, 'learning_rate': 0.0001000578034682081, 'epoch': 3.72} g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1583, 'learning_rate': 9.98843930635838e-05, 'epoch': 3.72} g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1118, 'learning_rate': 9.971098265895953e-05, 'epoch': 3.72} g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1163, 'learning_rate': 9.953757225433525e-05, 'epoch': 3.72} g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:31:11,566 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:31:11,566 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0947, 'learning_rate': 9.936416184971097e-05, 'epoch': 3.72} [WARNING|modeling_utils.py:388] 2022-03-23 22:31:11,566 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:31:11,566 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:31:11,566 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:31:11,566 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|███████████████████████████████████████████████████████▉ | 1662/2230 [5:32:08<1:54:25, 12.09s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|███████████████████████████████████████████████████████▉ | 1662/2230 [5:32:08<1:54:25, 12.09s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.097, 'learning_rate': 9.91907514450867e-05, 'epoch': 3.73} 75%|███████████████████████████████████████████████████████▉ | 1662/2230 [5:32:08<1:54:25, 12.09s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|███████████████████████████████████████████████████████▉ | 1662/2230 [5:32:08<1:54:25, 12.09s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|███████████████████████████████████████████████████████▉ | 1662/2230 [5:32:08<1:54:25, 12.09s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|███████████████████████████████████████████████████████▉ | 1662/2230 [5:32:08<1:54:25, 12.09s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|███████████████████████████████████████████████████████▉ | 1663/2230 [5:32:20<1:54:07, 12.08s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|███████████████████████████████████████████████████████▉ | 1663/2230 [5:32:20<1:54:07, 12.08s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1164, 'learning_rate': 9.901734104046241e-05, 'epoch': 3.73} 75%|███████████████████████████████████████████████████████▉ | 1663/2230 [5:32:20<1:54:07, 12.08s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|███████████████████████████████████████████████████████▉ | 1663/2230 [5:32:20<1:54:07, 12.08s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|███████████████████████████████████████████████████████▉ | 1663/2230 [5:32:20<1:54:07, 12.08s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|███████████████████████████████████████████████████████▉ | 1663/2230 [5:32:20<1:54:07, 12.08s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|███████████████████████████████████████████████████████▉ | 1663/2230 [5:32:20<1:54:07, 12.08s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|███████████████████████████████████████████████████████▉ | 1663/2230 [5:32:20<1:54:07, 12.08s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1529, 'learning_rate': 9.884393063583814e-05, 'epoch': 3.73} [WARNING|modeling_bart.py:1051] 2022-03-23 22:31:52,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:31:52,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:31:52,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:31:52,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:31:52,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:31:52,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1325, 'learning_rate': 9.867052023121385e-05, 'epoch': 3.73} [WARNING|modeling_bart.py:1051] 2022-03-23 22:31:52,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:31:52,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:31:52,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:31:52,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|████████████████████████████████████████████████████████ | 1666/2230 [5:32:54<1:50:19, 11.74s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|████████████████████████████████████████████████████████ | 1666/2230 [5:32:54<1:50:19, 11.74s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|████████████████████████████████████████████████████████ | 1666/2230 [5:32:54<1:50:19, 11.74s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|████████████████████████████████████████████████████████ | 1666/2230 [5:32:54<1:50:19, 11.74s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|████████████████████████████████████████████████████████ | 1666/2230 [5:32:54<1:50:19, 11.74s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|████████████████████████████████████████████████████████ | 1666/2230 [5:32:54<1:50:19, 11.74s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|████████████████████████████████████████████████████████ | 1666/2230 [5:32:54<1:50:19, 11.74s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1572, 'learning_rate': 9.83236994219653e-05, 'epoch': 3.74} 75%|████████████████████████████████████████████████████████ | 1666/2230 [5:32:54<1:50:19, 11.74s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|████████████████████████████████████████████████████████ | 1666/2230 [5:32:54<1:50:19, 11.74s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|████████████████████████████████████████████████████████ | 1666/2230 [5:32:54<1:50:19, 11.74s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|████████████████████████████████████████████████████████ | 1666/2230 [5:32:54<1:50:19, 11.74s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|████████████████████████████████████████████████████████ | 1668/2230 [5:33:17<1:48:14, 11.56s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|████████████████████████████████████████████████████████ | 1668/2230 [5:33:17<1:48:14, 11.56s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1051, 'learning_rate': 9.815028901734104e-05, 'epoch': 3.74} 75%|████████████████████████████████████████████████████████ | 1668/2230 [5:33:17<1:48:14, 11.56s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|████████████████████████████████████████████████████████ | 1668/2230 [5:33:17<1:48:14, 11.56s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|████████████████████████████████████████████████████████ | 1668/2230 [5:33:17<1:48:14, 11.56s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|████████████████████████████████████████████████████████ | 1668/2230 [5:33:17<1:48:14, 11.56s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|████████████████████████████████████████████████████████ | 1668/2230 [5:33:17<1:48:14, 11.56s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1572, 'learning_rate': 9.797687861271675e-05, 'epoch': 3.74} 75%|████████████████████████████████████████████████████████ | 1668/2230 [5:33:17<1:48:14, 11.56s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:32:49,908 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:32:49,908 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:32:49,908 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:32:49,908 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:32:49,908 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1192, 'learning_rate': 9.780346820809248e-05, 'epoch': 3.74} [WARNING|modeling_utils.py:388] 2022-03-23 22:32:49,908 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:33:02,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:33:02,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|████████████████████████████████████████████████████████▏ | 1671/2230 [5:33:50<1:44:49, 11.25s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|████████████████████████████████████████████████████████▏ | 1671/2230 [5:33:50<1:44:49, 11.25s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1136, 'learning_rate': 9.763005780346819e-05, 'epoch': 3.75} 75%|████████████████████████████████████████████████████████▏ | 1671/2230 [5:33:50<1:44:49, 11.25s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|████████████████████████████████████████████████████████▏ | 1671/2230 [5:33:50<1:44:49, 11.25s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|████████████████████████████████████████████████████████▏ | 1671/2230 [5:33:50<1:44:49, 11.25s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|████████████████████████████████████████████████████████▏ | 1671/2230 [5:33:50<1:44:49, 11.25s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|████████████████████████████████████████████████████████▏ | 1671/2230 [5:33:50<1:44:49, 11.25s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1003, 'learning_rate': 9.745664739884392e-05, 'epoch': 3.75} 75%|████████████████████████████████████████████████████████▏ | 1671/2230 [5:33:50<1:44:49, 11.25s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:33:22,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:33:22,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:33:22,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:33:22,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:33:22,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1195, 'learning_rate': 9.728323699421964e-05, 'epoch': 3.75} [WARNING|modeling_utils.py:388] 2022-03-23 22:33:32,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:33:32,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:33:32,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:33:39,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:33:39,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1246, 'learning_rate': 9.710982658959536e-05, 'epoch': 3.75} [WARNING|modeling_bart.py:1051] 2022-03-23 22:33:43,709 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:33:43,709 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:33:43,709 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:33:43,709 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:33:43,709 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1734, 'learning_rate': 9.693641618497108e-05, 'epoch': 3.76} [WARNING|modeling_utils.py:388] 2022-03-23 22:33:53,278 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:33:53,278 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:33:57,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:33:57,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:33:57,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:01,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:01,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:34:05,912 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:34:05,912 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 75%|████████████████████████████████████████████████████████▍ | 1677/2230 [5:34:52<1:32:59, 10.09s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:09,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:12,087 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:12,087 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:34:16,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:34:16,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1254, 'learning_rate': 9.641618497109826e-05, 'epoch': 3.76} [WARNING|modeling_utils.py:388] 2022-03-23 22:34:19,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:21,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:24,063 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:24,063 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:26,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:28,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:30,204 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:32,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:32,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:34,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:35,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:37,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:39,606 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:39,606 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:41,475 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:43,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:46,537 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:46,537 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:48,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:49,794 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:51,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:51,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:54,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:55,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:58,274 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:34:58,274 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:00,795 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:01,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:01,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:04,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:06,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:06,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:08,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:10,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:10,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:12,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:14,622 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:14,622 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:17,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:17,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:20,661 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:20,661 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:24,214 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:24,214 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:27,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:27,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:31,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:31,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:34,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:34,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:38,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:41,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:41,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:41,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:45,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:45,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:48,461 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:51,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:51,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:55,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:55,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:55,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:35:58,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:36:01,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:36:01,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:36:05,293 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:36:05,293 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:36:08,607 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:36:08,607 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:36:12,050 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:36:12,050 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:36:12,050 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:36:12,050 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:36:12,050 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:36:12,050 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:36:12,050 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:36:12,050 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1759, 'learning_rate': 9.38150289017341e-05, 'epoch': 3.8} [WARNING|modeling_utils.py:388] 2022-03-23 22:36:12,050 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:36:12,050 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:36:12,050 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:36:12,050 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:36:12,050 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1882, 'learning_rate': 9.346820809248554e-05, 'epoch': 3.8} 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1517, 'learning_rate': 9.329479768786127e-05, 'epoch': 3.8} 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1269, 'learning_rate': 9.312138728323698e-05, 'epoch': 3.8} 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1485, 'learning_rate': 9.294797687861271e-05, 'epoch': 3.81} 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1608, 'learning_rate': 9.277456647398842e-05, 'epoch': 3.81} 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1159, 'learning_rate': 9.260115606936415e-05, 'epoch': 3.81} 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1598, 'learning_rate': 9.242774566473988e-05, 'epoch': 3.81} 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1181, 'learning_rate': 9.22543352601156e-05, 'epoch': 3.82} 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1478, 'learning_rate': 9.208092485549132e-05, 'epoch': 3.82} 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0908, 'learning_rate': 9.190751445086705e-05, 'epoch': 3.82} 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1451, 'learning_rate': 9.173410404624276e-05, 'epoch': 3.82} 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1336, 'learning_rate': 9.156069364161849e-05, 'epoch': 3.83} 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 76%|████████████████████████████████████████████████████████▉ | 1694/2230 [5:37:21<1:51:11, 12.45s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1364, 'learning_rate': 9.121387283236993e-05, 'epoch': 3.83} 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1621, 'learning_rate': 9.104046242774565e-05, 'epoch': 3.83} 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1221, 'learning_rate': 9.086705202312139e-05, 'epoch': 3.83} 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1516, 'learning_rate': 9.06936416184971e-05, 'epoch': 3.84} 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▍ | 1707/2230 [5:40:07<1:48:16, 12.42s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▌ | 1712/2230 [5:41:06<1:43:33, 12.00s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▌ | 1712/2230 [5:41:06<1:43:33, 12.00s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1168, 'learning_rate': 9.052023121387283e-05, 'epoch': 3.84} 77%|█████████████████████████████████████████████████████████▌ | 1712/2230 [5:41:06<1:43:33, 12.00s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▌ | 1712/2230 [5:41:06<1:43:33, 12.00s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▌ | 1712/2230 [5:41:06<1:43:33, 12.00s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▌ | 1712/2230 [5:41:06<1:43:33, 12.00s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▌ | 1713/2230 [5:41:19<1:43:32, 12.02s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▌ | 1713/2230 [5:41:19<1:43:32, 12.02s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1298, 'learning_rate': 9.034682080924854e-05, 'epoch': 3.84} 77%|█████████████████████████████████████████████████████████▌ | 1713/2230 [5:41:19<1:43:32, 12.02s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▌ | 1713/2230 [5:41:19<1:43:32, 12.02s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▌ | 1713/2230 [5:41:19<1:43:32, 12.02s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▌ | 1713/2230 [5:41:19<1:43:32, 12.02s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▌ | 1713/2230 [5:41:19<1:43:32, 12.02s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▌ | 1713/2230 [5:41:19<1:43:32, 12.02s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1345, 'learning_rate': 9.017341040462427e-05, 'epoch': 3.84} 77%|█████████████████████████████████████████████████████████▌ | 1713/2230 [5:41:19<1:43:32, 12.02s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▌ | 1713/2230 [5:41:19<1:43:32, 12.02s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▌ | 1713/2230 [5:41:19<1:43:32, 12.02s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:40:57,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:40:57,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1037, 'learning_rate': 8.999999999999999e-05, 'epoch': 3.85} [WARNING|modeling_utils.py:388] 2022-03-23 22:40:57,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:40:57,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:40:57,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:40:57,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:40:57,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:40:57,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1118, 'learning_rate': 8.982658959537573e-05, 'epoch': 3.85} [WARNING|modeling_utils.py:388] 2022-03-23 22:40:57,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:40:57,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:40:57,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▋ | 1717/2230 [5:42:04<1:38:34, 11.53s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▋ | 1717/2230 [5:42:04<1:38:34, 11.53s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1032, 'learning_rate': 8.965317919075143e-05, 'epoch': 3.85} [WARNING|modeling_utils.py:388] 2022-03-23 22:41:24,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:41:24,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:41:24,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:41:24,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:41:24,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1485, 'learning_rate': 8.947976878612717e-05, 'epoch': 3.85} [WARNING|modeling_utils.py:388] 2022-03-23 22:41:24,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:41:24,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:41:24,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:41:40,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:41:40,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1301, 'learning_rate': 8.930635838150287e-05, 'epoch': 3.85} [WARNING|modeling_utils.py:388] 2022-03-23 22:41:45,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:41:45,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:41:49,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:41:49,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▊ | 1720/2230 [5:42:37<1:35:07, 11.19s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▊ | 1720/2230 [5:42:37<1:35:07, 11.19s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.126, 'learning_rate': 8.913294797687861e-05, 'epoch': 3.86} 77%|█████████████████████████████████████████████████████████▊ | 1720/2230 [5:42:37<1:35:07, 11.19s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▊ | 1720/2230 [5:42:37<1:35:07, 11.19s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▊ | 1720/2230 [5:42:37<1:35:07, 11.19s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▊ | 1720/2230 [5:42:37<1:35:07, 11.19s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▊ | 1720/2230 [5:42:37<1:35:07, 11.19s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1263, 'learning_rate': 8.895953757225433e-05, 'epoch': 3.86} 77%|█████████████████████████████████████████████████████████▊ | 1720/2230 [5:42:37<1:35:07, 11.19s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▊ | 1720/2230 [5:42:37<1:35:07, 11.19s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▊ | 1720/2230 [5:42:37<1:35:07, 11.19s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▊ | 1720/2230 [5:42:37<1:35:07, 11.19s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|█████████████████████████████████████████████████████████▊ | 1720/2230 [5:42:37<1:35:07, 11.19s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:42:15,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:42:15,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:42:15,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:42:15,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:42:15,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:42:15,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:42:26,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:42:26,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:42:26,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:42:26,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:42:34,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:42:34,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0905, 'learning_rate': 8.84393063583815e-05, 'epoch': 3.87} [WARNING|modeling_bart.py:1051] 2022-03-23 22:42:34,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:42:34,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:42:42,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:42:42,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:42:42,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1383, 'learning_rate': 8.826589595375721e-05, 'epoch': 3.87} [WARNING|modeling_utils.py:388] 2022-03-23 22:42:48,807 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:42:48,807 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:42:48,807 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:42:54,725 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:42:54,725 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0949, 'learning_rate': 8.809248554913295e-05, 'epoch': 3.87} [WARNING|modeling_bart.py:1051] 2022-03-23 22:42:59,020 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:42:59,020 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:02,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:05,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:05,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1031, 'learning_rate': 8.791907514450865e-05, 'epoch': 3.87} [WARNING|modeling_bart.py:1051] 2022-03-23 22:43:09,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:43:11,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|██████████████████████████████████████████████████████████ | 1728/2230 [5:43:57<1:20:27, 9.62s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 77%|██████████████████████████████████████████████████████████ | 1728/2230 [5:43:57<1:20:27, 9.62s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:15,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:17,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:19,153 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:21,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:21,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:23,259 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:25,190 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:27,083 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:28,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:28,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:30,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:32,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:34,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:36,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:36,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:37,808 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:41,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:42,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:42,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:44,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:47,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:48,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:48,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:51,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:53,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:54,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:54,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:57,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:59,232 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:43:59,232 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:01,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:03,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:03,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:05,602 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:05,602 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:07,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:09,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:09,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1394, 'learning_rate': 8.601156069364162e-05, 'epoch': 3.9} [WARNING|modeling_utils.py:388] 2022-03-23 22:44:12,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:12,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:16,331 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:19,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:19,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:19,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:23,306 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:23,306 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:26,916 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:26,916 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:30,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:33,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:33,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:33,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:37,285 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:37,285 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:40,822 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:44,225 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:44,225 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:47,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:50,909 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:50,909 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1601, 'learning_rate': 8.549132947976878e-05, 'epoch': 3.9} [WARNING|modeling_utils.py:388] 2022-03-23 22:44:54,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:54,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:57,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:44:57,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:45:01,066 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:45:04,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:45:04,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1933, 'learning_rate': 8.53179190751445e-05, 'epoch': 3.91} [WARNING|modeling_utils.py:388] 2022-03-23 22:45:07,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:45:11,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:45:11,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:45:11,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:45:11,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:45:11,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1663, 'learning_rate': 8.514450867052023e-05, 'epoch': 3.91} [WARNING|modeling_utils.py:388] 2022-03-23 22:45:11,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:45:11,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:45:11,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:45:11,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:45:11,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:45:11,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:45:11,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1629, 'learning_rate': 8.497109826589596e-05, 'epoch': 3.91} [WARNING|modeling_utils.py:388] 2022-03-23 22:45:11,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:45:11,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:45:11,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:45:11,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▋ | 1745/2230 [5:46:28<1:41:54, 12.61s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▋ | 1745/2230 [5:46:28<1:41:54, 12.61s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1432, 'learning_rate': 8.479768786127167e-05, 'epoch': 3.91} 78%|██████████████████████████████████████████████████████████▋ | 1745/2230 [5:46:28<1:41:54, 12.61s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▋ | 1745/2230 [5:46:28<1:41:54, 12.61s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▋ | 1745/2230 [5:46:28<1:41:54, 12.61s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▋ | 1745/2230 [5:46:28<1:41:54, 12.61s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▋ | 1745/2230 [5:46:28<1:41:54, 12.61s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▋ | 1745/2230 [5:46:28<1:41:54, 12.61s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1363, 'learning_rate': 8.46242774566474e-05, 'epoch': 3.91} 78%|██████████████████████████████████████████████████████████▋ | 1745/2230 [5:46:28<1:41:54, 12.61s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▋ | 1745/2230 [5:46:28<1:41:54, 12.61s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▋ | 1745/2230 [5:46:28<1:41:54, 12.61s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▋ | 1745/2230 [5:46:28<1:41:54, 12.61s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▋ | 1745/2230 [5:46:28<1:41:54, 12.61s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▋ | 1745/2230 [5:46:28<1:41:54, 12.61s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▋ | 1745/2230 [5:46:28<1:41:54, 12.61s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1099, 'learning_rate': 8.445086705202311e-05, 'epoch': 3.92} 78%|██████████████████████████████████████████████████████████▋ | 1745/2230 [5:46:28<1:41:54, 12.61s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▋ | 1745/2230 [5:46:28<1:41:54, 12.61s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▋ | 1745/2230 [5:46:28<1:41:54, 12.61s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▋ | 1745/2230 [5:46:28<1:41:54, 12.61s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▋ | 1745/2230 [5:46:28<1:41:54, 12.61s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1364, 'learning_rate': 8.410404624277456e-05, 'epoch': 3.92} 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1443, 'learning_rate': 8.393063583815028e-05, 'epoch': 3.92} 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1386, 'learning_rate': 8.3757225433526e-05, 'epoch': 3.93} 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1371, 'learning_rate': 8.358381502890174e-05, 'epoch': 3.93} 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 78%|██████████████████████████████████████████████████████████▊ | 1748/2230 [5:47:07<1:42:46, 12.79s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1255, 'learning_rate': 8.341040462427745e-05, 'epoch': 3.93} 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1153, 'learning_rate': 8.323699421965317e-05, 'epoch': 3.93} 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1359, 'learning_rate': 8.30635838150289e-05, 'epoch': 3.93} 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1598, 'learning_rate': 8.289017341040461e-05, 'epoch': 3.94} 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1175, 'learning_rate': 8.271676300578034e-05, 'epoch': 3.94} 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0973, 'learning_rate': 8.254335260115605e-05, 'epoch': 3.94} 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1235, 'learning_rate': 8.236994219653178e-05, 'epoch': 3.94} 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.128, 'learning_rate': 8.21965317919075e-05, 'epoch': 3.95} 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|██████████████████████████████████████████████████████████▉ | 1753/2230 [5:48:10<1:40:23, 12.63s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▏ | 1761/2230 [5:49:47<1:33:46, 12.00s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▏ | 1761/2230 [5:49:47<1:33:46, 12.00s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▏ | 1761/2230 [5:49:47<1:33:46, 12.00s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▏ | 1761/2230 [5:49:47<1:33:46, 12.00s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▏ | 1761/2230 [5:49:47<1:33:46, 12.00s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▏ | 1761/2230 [5:49:47<1:33:46, 12.00s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▏ | 1761/2230 [5:49:47<1:33:46, 12.00s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1162, 'learning_rate': 8.184971098265895e-05, 'epoch': 3.95} 79%|███████████████████████████████████████████████████████████▏ | 1761/2230 [5:49:47<1:33:46, 12.00s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▏ | 1761/2230 [5:49:47<1:33:46, 12.00s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▏ | 1761/2230 [5:49:47<1:33:46, 12.00s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▏ | 1761/2230 [5:49:47<1:33:46, 12.00s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▏ | 1761/2230 [5:49:47<1:33:46, 12.00s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▏ | 1761/2230 [5:49:47<1:33:46, 12.00s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1021, 'learning_rate': 8.167630057803468e-05, 'epoch': 3.95} 79%|███████████████████████████████████████████████████████████▏ | 1761/2230 [5:49:47<1:33:46, 12.00s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▏ | 1761/2230 [5:49:47<1:33:46, 12.00s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▏ | 1761/2230 [5:49:47<1:33:46, 12.00s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▏ | 1761/2230 [5:49:47<1:33:46, 12.00s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▎ | 1764/2230 [5:50:22<1:31:22, 11.76s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▎ | 1764/2230 [5:50:22<1:31:22, 11.76s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1703, 'learning_rate': 8.150289017341039e-05, 'epoch': 3.96} [WARNING|modeling_utils.py:388] 2022-03-23 22:49:42,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:49:42,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:49:42,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:49:48,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:49:48,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1103, 'learning_rate': 8.132947976878612e-05, 'epoch': 3.96} [WARNING|modeling_bart.py:1051] 2022-03-23 22:49:48,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:49:48,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:49:48,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:49:48,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▍ | 1766/2230 [5:50:45<1:28:51, 11.49s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▍ | 1766/2230 [5:50:45<1:28:51, 11.49s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1656, 'learning_rate': 8.115606936416184e-05, 'epoch': 3.96} 79%|███████████████████████████████████████████████████████████▍ | 1766/2230 [5:50:45<1:28:51, 11.49s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▍ | 1766/2230 [5:50:45<1:28:51, 11.49s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▍ | 1766/2230 [5:50:45<1:28:51, 11.49s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▍ | 1766/2230 [5:50:45<1:28:51, 11.49s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▍ | 1766/2230 [5:50:45<1:28:51, 11.49s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1121, 'learning_rate': 8.098265895953756e-05, 'epoch': 3.96} 79%|███████████████████████████████████████████████████████████▍ | 1766/2230 [5:50:45<1:28:51, 11.49s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:50:17,288 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:50:17,288 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:50:17,288 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:50:17,288 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:50:17,288 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0983, 'learning_rate': 8.080924855491328e-05, 'epoch': 3.96} [WARNING|modeling_utils.py:388] 2022-03-23 22:50:17,288 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:50:17,288 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:50:17,288 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▍ | 1769/2230 [5:51:17<1:24:59, 11.06s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 79%|███████████████████████████████████████████████████████████▍ | 1769/2230 [5:51:17<1:24:59, 11.06s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1085, 'learning_rate': 8.063583815028902e-05, 'epoch': 3.97} [WARNING|modeling_utils.py:388] 2022-03-23 22:50:37,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:50:37,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:50:37,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:50:43,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:50:43,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1103, 'learning_rate': 8.046242774566472e-05, 'epoch': 3.97} [WARNING|modeling_utils.py:388] 2022-03-23 22:50:47,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:50:47,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:50:47,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:50:54,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:50:54,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1093, 'learning_rate': 8.028901734104046e-05, 'epoch': 3.97} [WARNING|modeling_utils.py:388] 2022-03-23 22:50:57,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:50:57,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:50:57,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:04,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:04,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.097, 'learning_rate': 8.011560693641617e-05, 'epoch': 3.97} [WARNING|modeling_bart.py:1051] 2022-03-23 22:51:08,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:51:08,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:12,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:12,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:12,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.102, 'learning_rate': 7.99421965317919e-05, 'epoch': 3.98} [WARNING|modeling_utils.py:388] 2022-03-23 22:51:12,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:51:20,203 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:51:22,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:51:22,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1105, 'learning_rate': 7.976878612716762e-05, 'epoch': 3.98} [WARNING|modeling_utils.py:388] 2022-03-23 22:51:26,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:28,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:30,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:32,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:32,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:34,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:36,935 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:38,846 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:40,843 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:40,843 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:42,679 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:44,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:46,295 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:48,162 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:48,162 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:49,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:51,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:54,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:54,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:56,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:51:59,193 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:00,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:00,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:02,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:04,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:04,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:07,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:08,313 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:10,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:10,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:12,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:12,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:14,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:17,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:17,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:18,969 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:18,969 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:20,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:22,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:22,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:26,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:26,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:30,024 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:33,551 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:33,551 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1757, 'learning_rate': 7.786127167630058e-05, 'epoch': 4.0} [WARNING|modeling_utils.py:388] 2022-03-23 22:52:37,144 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:37,144 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:40,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:40,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:44,063 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:47,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:47,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1199, 'learning_rate': 7.768786127167628e-05, 'epoch': 4.0} [WARNING|modeling_utils.py:388] 2022-03-23 22:52:51,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:51,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:54,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:57,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:52:57,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:01,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:01,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1245, 'learning_rate': 7.751445086705202e-05, 'epoch': 4.01} [WARNING|modeling_utils.py:388] 2022-03-23 22:53:04,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:08,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:08,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:11,896 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:11,896 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:15,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:15,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0766, 'learning_rate': 7.716763005780346e-05, 'epoch': 4.01} [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1122, 'learning_rate': 7.699421965317918e-05, 'epoch': 4.01} [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0899, 'learning_rate': 7.682080924855491e-05, 'epoch': 4.02} [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0932, 'learning_rate': 7.664739884393062e-05, 'epoch': 4.02} [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1327, 'learning_rate': 7.647398843930635e-05, 'epoch': 4.02} [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:53:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1113, 'learning_rate': 7.630057803468207e-05, 'epoch': 4.02} [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.087, 'learning_rate': 7.61271676300578e-05, 'epoch': 4.02} [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0836, 'learning_rate': 7.595375722543352e-05, 'epoch': 4.03} [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0778, 'learning_rate': 7.578034682080925e-05, 'epoch': 4.03} [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0863, 'learning_rate': 7.560693641618496e-05, 'epoch': 4.03} [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:54:34,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1162, 'learning_rate': 7.52601156069364e-05, 'epoch': 4.04} g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0966, 'learning_rate': 7.508670520231213e-05, 'epoch': 4.04} g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0732, 'learning_rate': 7.491329479768785e-05, 'epoch': 4.04} g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0816, 'learning_rate': 7.473988439306357e-05, 'epoch': 4.04} g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0554, 'learning_rate': 7.45664739884393e-05, 'epoch': 4.04} g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.087, 'learning_rate': 7.439306358381502e-05, 'epoch': 4.05} g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0752, 'learning_rate': 7.421965317919074e-05, 'epoch': 4.05} g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|████████████████████████████████████████████████████████████▊ | 1807/2230 [5:58:03<1:25:47, 12.17s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|████████████████████████████████████████████████████████████▊ | 1807/2230 [5:58:03<1:25:47, 12.17s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0659, 'learning_rate': 7.404624277456646e-05, 'epoch': 4.05} 81%|████████████████████████████████████████████████████████████▊ | 1807/2230 [5:58:03<1:25:47, 12.17s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|████████████████████████████████████████████████████████████▊ | 1807/2230 [5:58:03<1:25:47, 12.17s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|████████████████████████████████████████████████████████████▊ | 1807/2230 [5:58:03<1:25:47, 12.17s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|████████████████████████████████████████████████████████████▊ | 1807/2230 [5:58:03<1:25:47, 12.17s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|████████████████████████████████████████████████████████████▊ | 1808/2230 [5:58:14<1:24:43, 12.05s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|████████████████████████████████████████████████████████████▊ | 1808/2230 [5:58:14<1:24:43, 12.05s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0671, 'learning_rate': 7.387283236994219e-05, 'epoch': 4.05} 81%|████████████████████████████████████████████████████████████▊ | 1808/2230 [5:58:14<1:24:43, 12.05s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|████████████████████████████████████████████████████████████▊ | 1808/2230 [5:58:14<1:24:43, 12.05s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|████████████████████████████████████████████████████████████▊ | 1808/2230 [5:58:14<1:24:43, 12.05s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|████████████████████████████████████████████████████████████▊ | 1808/2230 [5:58:14<1:24:43, 12.05s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|████████████████████████████████████████████████████████████▊ | 1808/2230 [5:58:14<1:24:43, 12.05s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|████████████████████████████████████████████████████████████▊ | 1808/2230 [5:58:14<1:24:43, 12.05s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0664, 'learning_rate': 7.369942196531791e-05, 'epoch': 4.06} 81%|████████████████████████████████████████████████████████████▊ | 1808/2230 [5:58:14<1:24:43, 12.05s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|████████████████████████████████████████████████████████████▊ | 1808/2230 [5:58:14<1:24:43, 12.05s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|████████████████████████████████████████████████████████████▊ | 1808/2230 [5:58:14<1:24:43, 12.05s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|████████████████████████████████████████████████████████████▊ | 1808/2230 [5:58:14<1:24:43, 12.05s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|████████████████████████████████████████████████████████████▊ | 1808/2230 [5:58:14<1:24:43, 12.05s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0496, 'learning_rate': 7.352601156069363e-05, 'epoch': 4.06} 81%|████████████████████████████████████████████████████████████▊ | 1808/2230 [5:58:14<1:24:43, 12.05s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|████████████████████████████████████████████████████████████▊ | 1808/2230 [5:58:14<1:24:43, 12.05s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|████████████████████████████████████████████████████████████▊ | 1808/2230 [5:58:14<1:24:43, 12.05s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:58:03,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:58:03,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:58:03,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0615, 'learning_rate': 7.335260115606935e-05, 'epoch': 4.06} [WARNING|modeling_utils.py:388] 2022-03-23 22:58:03,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:58:03,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:58:03,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:58:03,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:58:03,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0902, 'learning_rate': 7.317919075144507e-05, 'epoch': 4.06} [WARNING|modeling_utils.py:388] 2022-03-23 22:58:03,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:58:03,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:58:03,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:58:26,604 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|████████████████████████████████████████████████████████████▉ | 1813/2230 [5:59:13<1:21:18, 11.70s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|████████████████████████████████████████████████████████████▉ | 1813/2230 [5:59:13<1:21:18, 11.70s/it] Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0814, 'learning_rate': 7.30057803468208e-05, 'epoch': 4.07} [WARNING|modeling_utils.py:388] 2022-03-23 22:58:32,850 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:58:32,850 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:58:32,850 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:58:32,850 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:58:32,850 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0685, 'learning_rate': 7.283236994219653e-05, 'epoch': 4.07} [WARNING|modeling_utils.py:388] 2022-03-23 22:58:32,850 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:58:32,850 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:58:32,850 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:58:32,850 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|█████████████████████████████████████████████████████████████ | 1815/2230 [5:59:35<1:19:02, 11.43s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|█████████████████████████████████████████████████████████████ | 1815/2230 [5:59:35<1:19:02, 11.43s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0804, 'learning_rate': 7.265895953757225e-05, 'epoch': 4.07} 81%|█████████████████████████████████████████████████████████████ | 1815/2230 [5:59:35<1:19:02, 11.43s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|█████████████████████████████████████████████████████████████ | 1815/2230 [5:59:35<1:19:02, 11.43s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|█████████████████████████████████████████████████████████████ | 1815/2230 [5:59:35<1:19:02, 11.43s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|█████████████████████████████████████████████████████████████ | 1815/2230 [5:59:35<1:19:02, 11.43s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|█████████████████████████████████████████████████████████████ | 1815/2230 [5:59:35<1:19:02, 11.43s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0699, 'learning_rate': 7.248554913294797e-05, 'epoch': 4.07} 81%|█████████████████████████████████████████████████████████████ | 1815/2230 [5:59:35<1:19:02, 11.43s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|█████████████████████████████████████████████████████████████ | 1815/2230 [5:59:35<1:19:02, 11.43s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|█████████████████████████████████████████████████████████████ | 1815/2230 [5:59:35<1:19:02, 11.43s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|█████████████████████████████████████████████████████████████ | 1815/2230 [5:59:35<1:19:02, 11.43s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|█████████████████████████████████████████████████████████████ | 1815/2230 [5:59:35<1:19:02, 11.43s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 81%|█████████████████████████████████████████████████████████████ | 1815/2230 [5:59:35<1:19:02, 11.43s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0642, 'learning_rate': 7.231213872832369e-05, 'epoch': 4.07} 81%|█████████████████████████████████████████████████████████████ | 1815/2230 [5:59:35<1:19:02, 11.43s/it]g-point operations will not be computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:59:20,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 22:59:20,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:25:16,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 82%|█████████████████████████████████████████████████████████████▏ | 1818/2230 [6:00:07<1:15:07, 10.94s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 82%|█████████████████████████████████████████████████████████████▏ | 1818/2230 [6:00:07<1:15:07, 10.94s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0728, 'learning_rate': 7.213872832369941e-05, 'epoch': 4.08} 82%|█████████████████████████████████████████████████████████████▏ | 1818/2230 [6:00:07<1:15:07, 10.94s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 82%|█████████████████████████████████████████████████████████████▏ | 1818/2230 [6:00:07<1:15:07, 10.94s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 82%|█████████████████████████████████████████████████████████████▏ | 1818/2230 [6:00:07<1:15:07, 10.94s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 82%|█████████████████████████████████████████████████████████████▏ | 1819/2230 [6:00:17<1:13:34, 10.74s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 82%|█████████████████████████████████████████████████████████████▏ | 1819/2230 [6:00:17<1:13:34, 10.74s/it][WARNING|modeling_bart.py:1051] 2022-03-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:59:36,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:59:36,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:59:36,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:59:42,256 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:59:42,256 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:59:42,256 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0681, 'learning_rate': 7.179190751445085e-05, 'epoch': 4.08} [WARNING|modeling_utils.py:388] 2022-03-23 22:59:42,256 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:59:42,256 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:59:52,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:59:52,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:59:52,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.055, 'learning_rate': 7.161849710982659e-05, 'epoch': 4.08} [WARNING|modeling_utils.py:388] 2022-03-23 22:59:58,356 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 22:59:58,356 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:02,650 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:02,650 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:02,650 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:00:06,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:00:06,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:10,770 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 82%|█████████████████████████████████████████████████████████████▎ | 1823/2230 [6:00:56<1:06:58, 9.87s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 82%|█████████████████████████████████████████████████████████████▎ | 1823/2230 [6:00:56<1:06:58, 9.87s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:00:14,678 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:00:16,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:00:18,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:00:21,087 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:00:21,087 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:00:21,087 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:24,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:26,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:28,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:28,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:31,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:33,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:35,295 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:37,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:37,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:39,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:40,890 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:42,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:42,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:44,364 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:46,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:49,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:49,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:51,014 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:52,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:55,608 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:55,608 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:56,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:00:58,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:01,052 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:01,052 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:03,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:04,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:04,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:07,051 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:09,233 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:09,233 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:11,154 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:13,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:13,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:15,611 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:17,767 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:17,767 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1101, 'learning_rate': 6.936416184971097e-05, 'epoch': 4.11} [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:21,000 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:21,000 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:24,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:28,013 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:28,013 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:28,013 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:31,497 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:31,497 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:35,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:38,500 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:38,500 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:41,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:41,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:41,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:45,298 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:48,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:48,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:52,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:52,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:55,659 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:59,046 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:01:59,046 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0971, 'learning_rate': 6.884393063583815e-05, 'epoch': 4.12} [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:02,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:02,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:05,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:05,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:09,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:12,949 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:12,949 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.092, 'learning_rate': 6.867052023121387e-05, 'epoch': 4.12} [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1004, 'learning_rate': 6.849710982658959e-05, 'epoch': 4.12} [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1173, 'learning_rate': 6.832369942196531e-05, 'epoch': 4.13} [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.091, 'learning_rate': 6.815028901734103e-05, 'epoch': 4.13} [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1089, 'learning_rate': 6.797687861271676e-05, 'epoch': 4.13} [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0859, 'learning_rate': 6.780346820809248e-05, 'epoch': 4.13} [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:02:16,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0726, 'learning_rate': 6.76300578034682e-05, 'epoch': 4.13} 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.084, 'learning_rate': 6.745664739884392e-05, 'epoch': 4.14} 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0684, 'learning_rate': 6.728323699421964e-05, 'epoch': 4.14} 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0713, 'learning_rate': 6.710982658959537e-05, 'epoch': 4.14} 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0874, 'learning_rate': 6.693641618497109e-05, 'epoch': 4.14} 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0691, 'learning_rate': 6.676300578034682e-05, 'epoch': 4.15} 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0745, 'learning_rate': 6.658959537572254e-05, 'epoch': 4.15} 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0953, 'learning_rate': 6.641618497109826e-05, 'epoch': 4.15} 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0935, 'learning_rate': 6.624277456647398e-05, 'epoch': 4.15} 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0775, 'learning_rate': 6.60693641618497e-05, 'epoch': 4.15} 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0828, 'learning_rate': 6.589595375722542e-05, 'epoch': 4.16} 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1012, 'learning_rate': 6.572254335260114e-05, 'epoch': 4.16} 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1145, 'learning_rate': 6.554913294797688e-05, 'epoch': 4.16} 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████ | 1844/2230 [6:04:17<1:23:31, 12.98s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▍ | 1857/2230 [6:06:59<1:15:27, 12.14s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▍ | 1857/2230 [6:06:59<1:15:27, 12.14s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0819, 'learning_rate': 6.53757225433526e-05, 'epoch': 4.16} 83%|██████████████████████████████████████████████████████████████▍ | 1857/2230 [6:06:59<1:15:27, 12.14s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▍ | 1857/2230 [6:06:59<1:15:27, 12.14s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▍ | 1857/2230 [6:06:59<1:15:27, 12.14s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▍ | 1857/2230 [6:06:59<1:15:27, 12.14s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▍ | 1857/2230 [6:06:59<1:15:27, 12.14s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▍ | 1857/2230 [6:06:59<1:15:27, 12.14s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0653, 'learning_rate': 6.520231213872832e-05, 'epoch': 4.17} 83%|██████████████████████████████████████████████████████████████▍ | 1857/2230 [6:06:59<1:15:27, 12.14s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▍ | 1857/2230 [6:06:59<1:15:27, 12.14s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▍ | 1857/2230 [6:06:59<1:15:27, 12.14s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▍ | 1857/2230 [6:06:59<1:15:27, 12.14s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▍ | 1857/2230 [6:06:59<1:15:27, 12.14s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▍ | 1857/2230 [6:06:59<1:15:27, 12.14s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.095, 'learning_rate': 6.502890173410404e-05, 'epoch': 4.17} 83%|██████████████████████████████████████████████████████████████▍ | 1857/2230 [6:06:59<1:15:27, 12.14s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▍ | 1857/2230 [6:06:59<1:15:27, 12.14s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▍ | 1857/2230 [6:06:59<1:15:27, 12.14s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▍ | 1857/2230 [6:06:59<1:15:27, 12.14s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▍ | 1857/2230 [6:06:59<1:15:27, 12.14s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0649, 'learning_rate': 6.485549132947976e-05, 'epoch': 4.17} 83%|██████████████████████████████████████████████████████████████▍ | 1857/2230 [6:06:59<1:15:27, 12.14s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▍ | 1857/2230 [6:06:59<1:15:27, 12.14s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▍ | 1857/2230 [6:06:59<1:15:27, 12.14s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▍ | 1857/2230 [6:06:59<1:15:27, 12.14s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▌ | 1861/2230 [6:07:46<1:12:27, 11.78s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▌ | 1861/2230 [6:07:46<1:12:27, 11.78s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0629, 'learning_rate': 6.468208092485548e-05, 'epoch': 4.17} 83%|██████████████████████████████████████████████████████████████▌ | 1861/2230 [6:07:46<1:12:27, 11.78s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▌ | 1861/2230 [6:07:46<1:12:27, 11.78s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▌ | 1861/2230 [6:07:46<1:12:27, 11.78s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▌ | 1861/2230 [6:07:46<1:12:27, 11.78s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▌ | 1861/2230 [6:07:46<1:12:27, 11.78s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▌ | 1861/2230 [6:07:46<1:12:27, 11.78s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0555, 'learning_rate': 6.45086705202312e-05, 'epoch': 4.17} 83%|██████████████████████████████████████████████████████████████▌ | 1861/2230 [6:07:46<1:12:27, 11.78s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▌ | 1861/2230 [6:07:46<1:12:27, 11.78s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▌ | 1861/2230 [6:07:46<1:12:27, 11.78s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▌ | 1861/2230 [6:07:46<1:12:27, 11.78s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▌ | 1861/2230 [6:07:46<1:12:27, 11.78s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0947, 'learning_rate': 6.433526011560694e-05, 'epoch': 4.18} 83%|██████████████████████████████████████████████████████████████▌ | 1861/2230 [6:07:46<1:12:27, 11.78s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▌ | 1861/2230 [6:07:46<1:12:27, 11.78s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▌ | 1861/2230 [6:07:46<1:12:27, 11.78s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 83%|██████████████████████████████████████████████████████████████▌ | 1861/2230 [6:07:46<1:12:27, 11.78s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 84%|██████████████████████████████████████████████████████████████▋ | 1864/2230 [6:08:20<1:10:36, 11.57s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 84%|██████████████████████████████████████████████████████████████▋ | 1864/2230 [6:08:20<1:10:36, 11.57s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0661, 'learning_rate': 6.416184971098266e-05, 'epoch': 4.18} 84%|██████████████████████████████████████████████████████████████▋ | 1864/2230 [6:08:20<1:10:36, 11.57s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 84%|██████████████████████████████████████████████████████████████▋ | 1864/2230 [6:08:20<1:10:36, 11.57s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:07:44,363 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:07:44,363 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:07:48,602 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:07:48,602 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0607, 'learning_rate': 6.398843930635838e-05, 'epoch': 4.18} [WARNING|modeling_utils.py:388] 2022-03-23 23:07:52,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:07:52,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:07:56,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:07:56,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:07:56,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0694, 'learning_rate': 6.38150289017341e-05, 'epoch': 4.18} [WARNING|modeling_utils.py:388] 2022-03-23 23:07:56,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:08:05,044 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:08:05,044 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 84%|██████████████████████████████████████████████████████████████▊ | 1867/2230 [6:08:53<1:07:47, 11.21s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 84%|██████████████████████████████████████████████████████████████▊ | 1867/2230 [6:08:53<1:07:47, 11.21s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0924, 'learning_rate': 6.364161849710982e-05, 'epoch': 4.19} [WARNING|modeling_utils.py:388] 2022-03-23 23:08:13,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:08:13,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:08:13,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:08:19,674 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:08:19,674 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:08:19,674 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.089, 'learning_rate': 6.346820809248554e-05, 'epoch': 4.19} [WARNING|modeling_utils.py:388] 2022-03-23 23:08:19,674 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:08:19,674 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:08:19,674 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:08:19,674 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:08:31,501 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:08:31,501 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:08:31,501 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:08:37,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:08:37,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:08:37,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0567, 'learning_rate': 6.3121387283237e-05, 'epoch': 4.19} [WARNING|modeling_utils.py:388] 2022-03-23 23:08:37,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:08:37,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:08:47,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:08:47,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:08:47,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0759, 'learning_rate': 6.294797687861272e-05, 'epoch': 4.2} [WARNING|modeling_utils.py:388] 2022-03-23 23:08:54,222 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:08:54,222 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:08:58,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 84%|██████████████████████████████████████████████████████████████▉ | 1872/2230 [6:09:44<1:01:21, 10.28s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 84%|██████████████████████████████████████████████████████████████▉ | 1872/2230 [6:09:44<1:01:21, 10.28s/it] Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0725, 'learning_rate': 6.277456647398844e-05, 'epoch': 4.2} [WARNING|modeling_bart.py:1051] 2022-03-23 23:09:04,613 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:09:04,613 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:08,496 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:08,496 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0885, 'learning_rate': 6.260115606936416e-05, 'epoch': 4.2} [WARNING|modeling_utils.py:388] 2022-03-23 23:09:08,496 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:14,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:16,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:16,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:16,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:09:20,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:09:22,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:09:24,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:09:26,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:09:26,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0758, 'learning_rate': 6.22543352601156e-05, 'epoch': 4.2} [WARNING|modeling_utils.py:388] 2022-03-23 23:09:30,820 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:32,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:34,678 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:34,678 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:36,651 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:38,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:40,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:42,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:42,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:43,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:47,150 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:48,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:48,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:50,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:51,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:54,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:54,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:56,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:09:59,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:00,390 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:00,390 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:02,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:05,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:05,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:06,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:08,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:08,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:10,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:13,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:13,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:14,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:16,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:16,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:18,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:18,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:22,371 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:22,371 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:25,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:25,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:29,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:29,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:33,043 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:33,043 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:36,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:36,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:39,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:43,376 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:43,376 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1361, 'learning_rate': 6.034682080924855e-05, 'epoch': 4.23} [WARNING|modeling_utils.py:388] 2022-03-23 23:10:46,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:46,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:50,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:53,754 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:53,754 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:57,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:10:57,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1013, 'learning_rate': 6.0173410404624274e-05, 'epoch': 4.23} [WARNING|modeling_utils.py:388] 2022-03-23 23:11:00,712 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:11:04,046 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:11:04,046 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:11:07,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:11:07,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:11:11,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:11:11,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0918, 'learning_rate': 5.9999999999999995e-05, 'epoch': 4.23} [WARNING|modeling_utils.py:388] 2022-03-23 23:11:11,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:11:11,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:11:11,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:11:11,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:11:11,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:11:11,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:11:11,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0916, 'learning_rate': 5.982658959537572e-05, 'epoch': 4.24} [WARNING|modeling_utils.py:388] 2022-03-23 23:11:11,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:11:11,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:11:11,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:11:11,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:11:11,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:11:11,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0794, 'learning_rate': 5.965317919075144e-05, 'epoch': 4.24} [WARNING|modeling_utils.py:388] 2022-03-23 23:11:11,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:11:11,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:11:11,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:11:11,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:11:11,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▌ | 1891/2230 [6:12:37<1:12:09, 12.77s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▌ | 1891/2230 [6:12:37<1:12:09, 12.77s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1023, 'learning_rate': 5.9479768786127164e-05, 'epoch': 4.24} 85%|███████████████████████████████████████████████████████████████▌ | 1891/2230 [6:12:37<1:12:09, 12.77s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▌ | 1891/2230 [6:12:37<1:12:09, 12.77s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▌ | 1891/2230 [6:12:37<1:12:09, 12.77s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▌ | 1891/2230 [6:12:37<1:12:09, 12.77s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▌ | 1891/2230 [6:12:37<1:12:09, 12.77s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▌ | 1891/2230 [6:12:37<1:12:09, 12.77s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1047, 'learning_rate': 5.930635838150289e-05, 'epoch': 4.24} 85%|███████████████████████████████████████████████████████████████▌ | 1891/2230 [6:12:37<1:12:09, 12.77s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▌ | 1891/2230 [6:12:37<1:12:09, 12.77s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▌ | 1891/2230 [6:12:37<1:12:09, 12.77s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▌ | 1891/2230 [6:12:37<1:12:09, 12.77s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▌ | 1891/2230 [6:12:37<1:12:09, 12.77s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▌ | 1891/2230 [6:12:37<1:12:09, 12.77s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▌ | 1891/2230 [6:12:37<1:12:09, 12.77s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0789, 'learning_rate': 5.913294797687861e-05, 'epoch': 4.24} 85%|███████████████████████████████████████████████████████████████▌ | 1891/2230 [6:12:37<1:12:09, 12.77s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▌ | 1891/2230 [6:12:37<1:12:09, 12.77s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▌ | 1891/2230 [6:12:37<1:12:09, 12.77s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▌ | 1891/2230 [6:12:37<1:12:09, 12.77s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0798, 'learning_rate': 5.895953757225433e-05, 'epoch': 4.25} g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0895, 'learning_rate': 5.878612716763006e-05, 'epoch': 4.25} g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0754, 'learning_rate': 5.861271676300578e-05, 'epoch': 4.25} g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1032, 'learning_rate': 5.84393063583815e-05, 'epoch': 4.25} 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0945, 'learning_rate': 5.8265895953757215e-05, 'epoch': 4.26} 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0874, 'learning_rate': 5.8092485549132936e-05, 'epoch': 4.26} 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0859, 'learning_rate': 5.791907514450866e-05, 'epoch': 4.26} 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▊ | 1897/2230 [6:13:55<1:11:43, 12.92s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0794, 'learning_rate': 5.7745664739884384e-05, 'epoch': 4.26} g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▉ | 1902/2230 [6:14:59<1:09:23, 12.69s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▉ | 1902/2230 [6:14:59<1:09:23, 12.69s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0629, 'learning_rate': 5.7572254335260105e-05, 'epoch': 4.26} 85%|███████████████████████████████████████████████████████████████▉ | 1902/2230 [6:14:59<1:09:23, 12.69s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▉ | 1902/2230 [6:14:59<1:09:23, 12.69s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▉ | 1902/2230 [6:14:59<1:09:23, 12.69s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|███████████████████████████████████████████████████████████████▉ | 1902/2230 [6:14:59<1:09:23, 12.69s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.087, 'learning_rate': 5.739884393063583e-05, 'epoch': 4.27} 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0803, 'learning_rate': 5.722543352601155e-05, 'epoch': 4.27} 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0901, 'learning_rate': 5.705202312138727e-05, 'epoch': 4.27} 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1085, 'learning_rate': 5.6878612716762994e-05, 'epoch': 4.27} 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0724, 'learning_rate': 5.670520231213872e-05, 'epoch': 4.28} 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 85%|████████████████████████████████████████████████████████████████ | 1903/2230 [6:15:11<1:08:44, 12.61s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▏ | 1908/2230 [6:16:12<1:05:26, 12.19s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▏ | 1908/2230 [6:16:12<1:05:26, 12.19s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▏ | 1908/2230 [6:16:12<1:05:26, 12.19s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▏ | 1908/2230 [6:16:12<1:05:26, 12.19s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▏ | 1908/2230 [6:16:12<1:05:26, 12.19s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▏ | 1908/2230 [6:16:12<1:05:26, 12.19s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▏ | 1908/2230 [6:16:12<1:05:26, 12.19s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0678, 'learning_rate': 5.635838150289016e-05, 'epoch': 4.28} 86%|████████████████████████████████████████████████████████████████▏ | 1908/2230 [6:16:12<1:05:26, 12.19s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▏ | 1908/2230 [6:16:12<1:05:26, 12.19s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▏ | 1908/2230 [6:16:12<1:05:26, 12.19s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▏ | 1908/2230 [6:16:12<1:05:26, 12.19s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▏ | 1908/2230 [6:16:12<1:05:26, 12.19s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▏ | 1908/2230 [6:16:12<1:05:26, 12.19s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0715, 'learning_rate': 5.618497109826589e-05, 'epoch': 4.28} 86%|████████████████████████████████████████████████████████████████▏ | 1908/2230 [6:16:12<1:05:26, 12.19s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▏ | 1908/2230 [6:16:12<1:05:26, 12.19s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▏ | 1908/2230 [6:16:12<1:05:26, 12.19s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▏ | 1908/2230 [6:16:12<1:05:26, 12.19s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▏ | 1908/2230 [6:16:12<1:05:26, 12.19s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▏ | 1908/2230 [6:16:12<1:05:26, 12.19s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0755, 'learning_rate': 5.601156069364161e-05, 'epoch': 4.28} 86%|████████████████████████████████████████████████████████████████▏ | 1908/2230 [6:16:12<1:05:26, 12.19s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▏ | 1908/2230 [6:16:12<1:05:26, 12.19s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▏ | 1908/2230 [6:16:12<1:05:26, 12.19s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▏ | 1908/2230 [6:16:12<1:05:26, 12.19s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▎ | 1912/2230 [6:16:59<1:02:47, 11.85s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▎ | 1912/2230 [6:16:59<1:02:47, 11.85s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0718, 'learning_rate': 5.583815028901733e-05, 'epoch': 4.29} 86%|████████████████████████████████████████████████████████████████▎ | 1912/2230 [6:16:59<1:02:47, 11.85s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▎ | 1912/2230 [6:16:59<1:02:47, 11.85s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▎ | 1912/2230 [6:16:59<1:02:47, 11.85s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▎ | 1912/2230 [6:16:59<1:02:47, 11.85s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▎ | 1912/2230 [6:16:59<1:02:47, 11.85s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0771, 'learning_rate': 5.566473988439306e-05, 'epoch': 4.29} 86%|████████████████████████████████████████████████████████████████▎ | 1912/2230 [6:16:59<1:02:47, 11.85s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▎ | 1912/2230 [6:16:59<1:02:47, 11.85s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▎ | 1912/2230 [6:16:59<1:02:47, 11.85s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▎ | 1912/2230 [6:16:59<1:02:47, 11.85s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▎ | 1912/2230 [6:16:59<1:02:47, 11.85s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▎ | 1912/2230 [6:16:59<1:02:47, 11.85s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0682, 'learning_rate': 5.549132947976878e-05, 'epoch': 4.29} 86%|████████████████████████████████████████████████████████████████▎ | 1912/2230 [6:16:59<1:02:47, 11.85s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▎ | 1912/2230 [6:16:59<1:02:47, 11.85s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▎ | 1912/2230 [6:16:59<1:02:47, 11.85s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▎ | 1912/2230 [6:16:59<1:02:47, 11.85s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▎ | 1912/2230 [6:16:59<1:02:47, 11.85s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|████████████████████████████████████████████████████████████████▎ | 1912/2230 [6:16:59<1:02:47, 11.85s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.087, 'learning_rate': 5.53179190751445e-05, 'epoch': 4.29} 86%|████████████████████████████████████████████████████████████████▎ | 1912/2230 [6:16:59<1:02:47, 11.85s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:16:56,629 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:16:56,629 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:17:00,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:17:00,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0566, 'learning_rate': 5.514450867052022e-05, 'epoch': 4.3} [WARNING|modeling_utils.py:388] 2022-03-23 23:17:00,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:17:00,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:17:08,877 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:17:08,877 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:17:08,877 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:17:08,877 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0778, 'learning_rate': 5.497109826589595e-05, 'epoch': 4.3} [WARNING|modeling_utils.py:388] 2022-03-23 23:17:08,877 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:17:08,877 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:17:08,877 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:17:08,877 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:17:08,877 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:17:25,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:17:25,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:17:25,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:17:31,229 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:17:31,229 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:17:31,229 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0559, 'learning_rate': 5.462427745664739e-05, 'epoch': 4.3} [WARNING|modeling_utils.py:388] 2022-03-23 23:17:31,229 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:17:31,229 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:17:41,545 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:17:41,545 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:17:41,545 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0472, 'learning_rate': 5.445086705202312e-05, 'epoch': 4.3} [WARNING|modeling_utils.py:388] 2022-03-23 23:17:47,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:17:47,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:17:47,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:17:47,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:17:47,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:17:56,171 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:17:56,171 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:00,204 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:00,204 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:00,204 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:00,204 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:06,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:08,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:08,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|██████████████████████████████████████████████████████████████████▍ | 1923/2230 [6:18:56<51:13, 10.01s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 86%|██████████████████████████████████████████████████████████████████▍ | 1923/2230 [6:18:56<51:13, 10.01s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:14,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:16,647 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:16,647 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:18:20,676 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:18:20,676 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0811, 'learning_rate': 5.375722543352601e-05, 'epoch': 4.31} [WARNING|modeling_utils.py:388] 2022-03-23 23:18:24,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:26,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:28,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:28,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:31,174 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:31,174 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:34,165 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:36,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:36,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:38,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:39,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:41,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:43,586 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:43,586 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:45,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:48,857 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:50,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:50,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:52,055 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:53,717 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:56,661 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:56,661 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:18:58,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:00,754 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:03,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:03,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:04,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:06,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:06,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:08,857 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:10,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:10,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:12,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:15,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:15,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:17,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:17,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:18,507 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:18,507 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:22,256 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:22,256 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:25,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:29,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:29,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:29,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:32,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:32,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:36,479 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:36,479 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:39,885 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:43,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:43,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:43,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:46,839 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:50,373 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:50,373 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:53,808 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:53,808 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:57,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:57,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:19:57,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:00,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:04,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:04,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:07,769 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:07,769 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:11,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:11,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:11,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:14,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1036, 'learning_rate': 5.1156069364161844e-05, 'epoch': 4.35} [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0725, 'learning_rate': 5.0982658959537565e-05, 'epoch': 4.35} [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0853, 'learning_rate': 5.080924855491329e-05, 'epoch': 4.35} [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1104, 'learning_rate': 5.063583815028901e-05, 'epoch': 4.35} [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0764, 'learning_rate': 5.0462427745664734e-05, 'epoch': 4.36} [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:20:18,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0781, 'learning_rate': 5.028901734104046e-05, 'epoch': 4.36} 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0802, 'learning_rate': 5.011560693641618e-05, 'epoch': 4.36} 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0705, 'learning_rate': 4.99421965317919e-05, 'epoch': 4.36} 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0725, 'learning_rate': 4.976878612716762e-05, 'epoch': 4.37} 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.054, 'learning_rate': 4.959537572254335e-05, 'epoch': 4.37} 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0575, 'learning_rate': 4.942196531791907e-05, 'epoch': 4.37} 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0801, 'learning_rate': 4.924855491329479e-05, 'epoch': 4.37} 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0763, 'learning_rate': 4.907514450867052e-05, 'epoch': 4.37} 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0724, 'learning_rate': 4.890173410404624e-05, 'epoch': 4.38} 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 87%|█████████████████████████████████████████████████████████████████▍ | 1944/2230 [6:22:18<1:02:03, 13.02s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▍ | 1953/2230 [6:24:12<57:59, 12.56s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▍ | 1953/2230 [6:24:12<57:59, 12.56s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0776, 'learning_rate': 4.872832369942196e-05, 'epoch': 4.38} 88%|███████████████████████████████████████████████████████████████████▍ | 1953/2230 [6:24:12<57:59, 12.56s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▍ | 1953/2230 [6:24:12<57:59, 12.56s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▍ | 1953/2230 [6:24:12<57:59, 12.56s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▍ | 1953/2230 [6:24:12<57:59, 12.56s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▍ | 1954/2230 [6:24:25<57:28, 12.49s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▍ | 1954/2230 [6:24:25<57:28, 12.49s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0493, 'learning_rate': 4.855491329479768e-05, 'epoch': 4.38} 88%|███████████████████████████████████████████████████████████████████▍ | 1954/2230 [6:24:25<57:28, 12.49s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▍ | 1954/2230 [6:24:25<57:28, 12.49s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▍ | 1954/2230 [6:24:25<57:28, 12.49s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▍ | 1954/2230 [6:24:25<57:28, 12.49s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0714, 'learning_rate': 4.838150289017341e-05, 'epoch': 4.38} 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0701, 'learning_rate': 4.820809248554913e-05, 'epoch': 4.39} 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0742, 'learning_rate': 4.803468208092485e-05, 'epoch': 4.39} 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0693, 'learning_rate': 4.786127167630058e-05, 'epoch': 4.39} 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0687, 'learning_rate': 4.76878612716763e-05, 'epoch': 4.39} 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▌ | 1955/2230 [6:24:37<56:52, 12.41s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▋ | 1960/2230 [6:25:36<53:42, 11.94s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▋ | 1960/2230 [6:25:36<53:42, 11.94s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0831, 'learning_rate': 4.751445086705202e-05, 'epoch': 4.39} 88%|███████████████████████████████████████████████████████████████████▋ | 1960/2230 [6:25:36<53:42, 11.94s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▋ | 1960/2230 [6:25:36<53:42, 11.94s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▋ | 1960/2230 [6:25:36<53:42, 11.94s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:25:03,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:25:03,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0627, 'learning_rate': 4.734104046242774e-05, 'epoch': 4.4} [WARNING|modeling_bart.py:1051] 2022-03-23 23:25:03,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:25:03,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:25:03,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:25:03,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:25:15,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:25:15,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0741, 'learning_rate': 4.716763005780347e-05, 'epoch': 4.4} [WARNING|modeling_utils.py:388] 2022-03-23 23:25:15,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:25:15,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:25:15,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:25:15,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:25:15,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▊ | 1963/2230 [6:26:12<52:26, 11.78s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▊ | 1963/2230 [6:26:12<52:26, 11.78s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▊ | 1963/2230 [6:26:12<52:26, 11.78s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▊ | 1963/2230 [6:26:12<52:26, 11.78s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▊ | 1963/2230 [6:26:12<52:26, 11.78s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▊ | 1963/2230 [6:26:12<52:26, 11.78s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▊ | 1963/2230 [6:26:12<52:26, 11.78s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0782, 'learning_rate': 4.682080924855491e-05, 'epoch': 4.4} 88%|███████████████████████████████████████████████████████████████████▊ | 1963/2230 [6:26:12<52:26, 11.78s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:25:44,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:25:44,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:25:48,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:25:48,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:25:48,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:25:52,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:25:52,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:25:52,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:25:52,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:25:52,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:25:52,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.044, 'learning_rate': 4.647398843930636e-05, 'epoch': 4.41} [WARNING|modeling_utils.py:388] 2022-03-23 23:25:52,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:26:06,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:26:06,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:26:06,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:26:06,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:26:06,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:26:15,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:26:15,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:26:15,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:26:15,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▉ | 1968/2230 [6:27:07<48:31, 11.11s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▉ | 1968/2230 [6:27:07<48:31, 11.11s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0641, 'learning_rate': 4.61271676300578e-05, 'epoch': 4.41} 88%|███████████████████████████████████████████████████████████████████▉ | 1968/2230 [6:27:07<48:31, 11.11s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▉ | 1968/2230 [6:27:07<48:31, 11.11s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▉ | 1968/2230 [6:27:07<48:31, 11.11s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▉ | 1968/2230 [6:27:07<48:31, 11.11s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|███████████████████████████████████████████████████████████████████▉ | 1968/2230 [6:27:07<48:31, 11.11s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0749, 'learning_rate': 4.5953757225433526e-05, 'epoch': 4.41} [WARNING|modeling_utils.py:388] 2022-03-23 23:26:37,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:26:37,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:26:37,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:26:43,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:26:43,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0604, 'learning_rate': 4.5780346820809246e-05, 'epoch': 4.42} [WARNING|modeling_utils.py:388] 2022-03-23 23:26:47,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:26:47,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:26:47,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:26:53,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:26:53,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0612, 'learning_rate': 4.560693641618497e-05, 'epoch': 4.42} [WARNING|modeling_utils.py:388] 2022-03-23 23:26:53,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:26:59,934 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:26:59,934 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|████████████████████████████████████████████████████████████████████ | 1972/2230 [6:27:48<44:14, 10.29s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 88%|████████████████████████████████████████████████████████████████████ | 1972/2230 [6:27:48<44:14, 10.29s/it]g-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:27:05,985 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:27:08,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:27:08,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:27:08,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:27:14,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:27:14,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:27:16,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:27:16,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:27:20,546 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 22:59:24,045 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 89%|████████████████████████████████████████████████████████████████████▏ | 1974/2230 [6:28:06<41:37, 9.76s/it][WARNING|modeling_bart.py:1051] 2022-03-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 89%|████████████████████████████████████████████████████████████████████▏ | 1974/2230 [6:28:06<41:37, 9.76s/it][WARNING|modeling_bart.py:1051] 2022-03-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0514, 'learning_rate': 4.5086705202312136e-05, 'epoch': 4.43} [WARNING|modeling_utils.py:388] 2022-03-23 23:27:26,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:27:28,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:27:28,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:27:28,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:27:32,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:27:34,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:27:36,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:27:38,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:27:38,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:27:40,744 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:27:42,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:27:44,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:27:44,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:27:46,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:27:48,004 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:27:51,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:27:51,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:27:52,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:27:54,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:27:56,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:27:59,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:27:59,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:00,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:03,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:03,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:04,704 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:07,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:09,438 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:09,438 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:11,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:13,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:13,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:15,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:17,279 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:17,279 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:19,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:19,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:20,373 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:23,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:23,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:27,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:27,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:30,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:34,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:34,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1403, 'learning_rate': 4.3179190751445084e-05, 'epoch': 4.45} [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:37,611 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:37,611 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:41,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:44,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:44,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:47,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:47,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0965, 'learning_rate': 4.300578034682081e-05, 'epoch': 4.45} [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:51,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:54,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:54,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:58,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:28:58,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:01,452 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:01,452 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1017, 'learning_rate': 4.283236994219653e-05, 'epoch': 4.46} [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:04,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:08,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:08,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:11,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:11,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:15,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:15,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0803, 'learning_rate': 4.248554913294798e-05, 'epoch': 4.46} [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0917, 'learning_rate': 4.23121387283237e-05, 'epoch': 4.46} [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1152, 'learning_rate': 4.213872832369942e-05, 'epoch': 4.46} [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0697, 'learning_rate': 4.196531791907514e-05, 'epoch': 4.47} [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0904, 'learning_rate': 4.179190751445087e-05, 'epoch': 4.47} [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0707, 'learning_rate': 4.161849710982658e-05, 'epoch': 4.47} [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:29:18,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 89%|████████████████████████████████████████████████████████████████████▉ | 1995/2230 [6:31:32<50:39, 12.93s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 89%|████████████████████████████████████████████████████████████████████▉ | 1995/2230 [6:31:32<50:39, 12.93s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0799, 'learning_rate': 4.1445086705202304e-05, 'epoch': 4.47} 89%|████████████████████████████████████████████████████████████████████▉ | 1995/2230 [6:31:32<50:39, 12.93s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 89%|████████████████████████████████████████████████████████████████████▉ | 1995/2230 [6:31:32<50:39, 12.93s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 89%|████████████████████████████████████████████████████████████████████▉ | 1995/2230 [6:31:32<50:39, 12.93s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 89%|████████████████████████████████████████████████████████████████████▉ | 1995/2230 [6:31:32<50:39, 12.93s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0883, 'learning_rate': 4.1271676300578025e-05, 'epoch': 4.48} 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1033, 'learning_rate': 4.109826589595375e-05, 'epoch': 4.48} 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0797, 'learning_rate': 4.092485549132947e-05, 'epoch': 4.48} 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0984, 'learning_rate': 4.075144508670519e-05, 'epoch': 4.48} 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|████████████████████████████████████████████████████████████████████▉ | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2364] 2022-03-23 23:31:51,942 >> ***** Running Evaluation ***** | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2364] 2022-03-23 23:31:51,942 >> ***** Running Evaluation ***** | 1996/2230 [6:31:45<50:11, 12.87s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 03/23/2022 23:41:07 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow {'eval_loss': 0.32974740862846375, 'eval_wer': 0.09492264974216581, 'eval_runtime': 555.2371, 'eval_samples_per_second': 4.758, 'eval_steps_per_second': 0.596, 'epoch': 4.48} 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0764, 'learning_rate': 4.040462427745664e-05, 'epoch': 4.49} 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0685, 'learning_rate': 4.023121387283236e-05, 'epoch': 4.49} 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.096, 'learning_rate': 4.005780346820808e-05, 'epoch': 4.49} 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0908, 'learning_rate': 3.988439306358381e-05, 'epoch': 4.49} 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0824, 'learning_rate': 3.953757225433525e-05, 'epoch': 4.5} 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0638, 'learning_rate': 3.936416184971098e-05, 'epoch': 4.5} 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0703, 'learning_rate': 3.91907514450867e-05, 'epoch': 4.5} 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0662, 'learning_rate': 3.901734104046242e-05, 'epoch': 4.5} 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0581, 'learning_rate': 3.884393063583814e-05, 'epoch': 4.51} 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0755, 'learning_rate': 3.867052023121387e-05, 'epoch': 4.51} 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:45:16,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:45:16,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0439, 'learning_rate': 3.849710982658959e-05, 'epoch': 4.51} [WARNING|modeling_utils.py:388] 2022-03-23 23:45:16,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:45:16,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:45:16,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:45:16,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0588, 'learning_rate': 3.832369942196531e-05, 'epoch': 4.51} 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0649, 'learning_rate': 3.815028901734104e-05, 'epoch': 4.52} 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:45:53,461 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:45:53,461 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.071, 'learning_rate': 3.780346820809248e-05, 'epoch': 4.52} [WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.09, 'learning_rate': 3.76300578034682e-05, 'epoch': 4.52} [WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:46:21,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|█████████████████████████████████████████████████████████████████████▋ | 2018/2230 [6:47:08<41:05, 11.63s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 90%|█████████████████████████████████████████████████████████████████████▋ | 2018/2230 [6:47:08<41:05, 11.63s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0448, 'learning_rate': 3.745664739884393e-05, 'epoch': 4.52} [WARNING|modeling_utils.py:388] 2022-03-23 23:46:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:46:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:46:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:46:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:46:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0613, 'learning_rate': 3.728323699421965e-05, 'epoch': 4.53} [WARNING|modeling_utils.py:388] 2022-03-23 23:46:37,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:46:37,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:46:37,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:46:37,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:46:37,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:46:46,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:46:46,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:46:50,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:46:50,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:46:50,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 91%|█████████████████████████████████████████████████████████████████████▊ | 2021/2230 [6:47:39<37:26, 10.75s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:46:57,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:46:57,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:46:57,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:03,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:03,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0628, 'learning_rate': 3.6763005780346816e-05, 'epoch': 4.53} [WARNING|modeling_utils.py:388] 2022-03-23 23:47:03,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:09,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:09,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:47:13,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:47:13,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0486, 'learning_rate': 3.658959537572254e-05, 'epoch': 4.54} [WARNING|modeling_utils.py:388] 2022-03-23 23:47:17,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:19,684 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:19,684 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 91%|█████████████████████████████████████████████████████████████████████▉ | 2024/2230 [6:48:07<33:53, 9.87s/it][WARNING|modeling_bart.py:1051] 2022-03-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 91%|█████████████████████████████████████████████████████████████████████▉ | 2024/2230 [6:48:07<33:53, 9.87s/it][WARNING|modeling_bart.py:1051] 2022-03-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0715, 'learning_rate': 3.6416184971098265e-05, 'epoch': 4.54} [WARNING|modeling_utils.py:388] 2022-03-23 23:47:27,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:29,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:29,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:29,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0677, 'learning_rate': 3.6242774566473985e-05, 'epoch': 4.54} [WARNING|modeling_utils.py:388] 2022-03-23 23:47:35,251 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:37,228 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:39,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:39,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:41,156 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:43,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:44,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:46,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:46,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:48,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:52,050 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:53,678 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:53,678 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:55,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:56,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:59,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:47:59,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:01,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:04,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:05,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:05,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:07,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:09,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:09,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:11,363 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:13,431 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:13,431 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:15,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:18,063 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:18,063 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:19,820 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:21,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:21,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:23,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:23,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:27,196 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:27,196 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:30,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:34,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:34,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:34,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:37,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:37,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:41,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:41,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:44,798 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:48,230 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:48,230 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:48,230 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:51,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:51,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:55,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:58,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:48:58,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:02,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:02,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1258, 'learning_rate': 3.4161849710982654e-05, 'epoch': 4.57} [WARNING|modeling_utils.py:388] 2022-03-23 23:49:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:08,859 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:08,859 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:12,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:12,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:15,935 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:15,935 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1049, 'learning_rate': 3.38150289017341e-05, 'epoch': 4.57} [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0621, 'learning_rate': 3.364161849710982e-05, 'epoch': 4.57} [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0679, 'learning_rate': 3.346820809248554e-05, 'epoch': 4.58} [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0786, 'learning_rate': 3.329479768786127e-05, 'epoch': 4.58} [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0961, 'learning_rate': 3.312138728323699e-05, 'epoch': 4.58} [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.098, 'learning_rate': 3.294797687861271e-05, 'epoch': 4.58} 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0582, 'learning_rate': 3.277456647398844e-05, 'epoch': 4.59} 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0676, 'learning_rate': 3.260115606936416e-05, 'epoch': 4.59} 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0756, 'learning_rate': 3.242774566473988e-05, 'epoch': 4.59} 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0821, 'learning_rate': 3.22543352601156e-05, 'epoch': 4.59} 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0867, 'learning_rate': 3.208092485549133e-05, 'epoch': 4.59} 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0538, 'learning_rate': 3.190751445086705e-05, 'epoch': 4.6} 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0831, 'learning_rate': 3.173410404624277e-05, 'epoch': 4.6} 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0608, 'learning_rate': 3.15606936416185e-05, 'epoch': 4.6} 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.067, 'learning_rate': 3.138728323699422e-05, 'epoch': 4.6} 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0615, 'learning_rate': 3.121387283236994e-05, 'epoch': 4.61} 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0613, 'learning_rate': 3.1040462427745667e-05, 'epoch': 4.61} 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0709, 'learning_rate': 3.086705202312139e-05, 'epoch': 4.61} 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0792, 'learning_rate': 3.069364161849711e-05, 'epoch': 4.61} g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0584, 'learning_rate': 3.052023121387283e-05, 'epoch': 4.61} g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.071, 'learning_rate': 3.0346820809248553e-05, 'epoch': 4.62} g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0718, 'learning_rate': 3.0173410404624277e-05, 'epoch': 4.62} g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|███████████████████████████████████████████████████████████████████████▏ | 2061/2230 [6:54:51<33:29, 11.89s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|███████████████████████████████████████████████████████████████████████▏ | 2061/2230 [6:54:51<33:29, 11.89s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0671, 'learning_rate': 2.9999999999999997e-05, 'epoch': 4.62} 92%|███████████████████████████████████████████████████████████████████████▏ | 2061/2230 [6:54:51<33:29, 11.89s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|███████████████████████████████████████████████████████████████████████▏ | 2061/2230 [6:54:51<33:29, 11.89s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|███████████████████████████████████████████████████████████████████████▏ | 2061/2230 [6:54:51<33:29, 11.89s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|███████████████████████████████████████████████████████████████████████▏ | 2061/2230 [6:54:51<33:29, 11.89s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [6:55:03<33:01, 11.80s/it][WARNING|modeling_bart.py:1051] 2022-03-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [6:55:03<33:01, 11.80s/it][WARNING|modeling_bart.py:1051] 2022-03-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0882, 'learning_rate': 2.982658959537572e-05, 'epoch': 4.62} 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [6:55:03<33:01, 11.80s/it][WARNING|modeling_bart.py:1051] 2022-03-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [6:55:03<33:01, 11.80s/it][WARNING|modeling_bart.py:1051] 2022-03-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [6:55:03<33:01, 11.80s/it][WARNING|modeling_bart.py:1051] 2022-03-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [6:55:03<33:01, 11.80s/it][WARNING|modeling_bart.py:1051] 2022-03-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [6:55:03<33:01, 11.80s/it][WARNING|modeling_bart.py:1051] 2022-03-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0676, 'learning_rate': 2.9653179190751446e-05, 'epoch': 4.63} [WARNING|modeling_utils.py:388] 2022-03-23 23:54:33,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:54:33,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:54:33,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:54:33,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:54:33,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:54:33,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0508, 'learning_rate': 2.9479768786127166e-05, 'epoch': 4.63} [WARNING|modeling_utils.py:388] 2022-03-23 23:54:33,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:54:33,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:54:33,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:54:52,432 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:54:52,432 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:54:52,432 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:54:56,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:54:56,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:54:56,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:54:56,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 93%|███████████████████████████████████████████████████████████████████████▎ | 2066/2230 [6:55:48<31:02, 11.36s/it] Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 93%|███████████████████████████████████████████████████████████████████████▎ | 2066/2230 [6:55:48<31:02, 11.36s/it] Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0594, 'learning_rate': 2.9132947976878608e-05, 'epoch': 4.63} 93%|███████████████████████████████████████████████████████████████████████▎ | 2066/2230 [6:55:48<31:02, 11.36s/it] Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:55:10,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:55:10,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:55:14,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:55:14,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.071, 'learning_rate': 2.895953757225433e-05, 'epoch': 4.63} [WARNING|modeling_utils.py:388] 2022-03-23 23:55:14,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:55:14,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:55:14,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:55:25,251 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:55:25,251 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.053, 'learning_rate': 2.8786127167630052e-05, 'epoch': 4.64} [WARNING|modeling_utils.py:388] 2022-03-23 23:55:29,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:55:29,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:55:29,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:55:29,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:55:29,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0668, 'learning_rate': 2.8612716763005776e-05, 'epoch': 4.64} [WARNING|modeling_utils.py:388] 2022-03-23 23:55:39,546 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:55:39,546 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:55:39,546 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:55:45,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:55:45,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:55:45,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.054, 'learning_rate': 2.8439306358381497e-05, 'epoch': 4.64} [WARNING|modeling_utils.py:388] 2022-03-23 23:55:45,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:55:53,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:55:53,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:55:53,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:55:53,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:55:59,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:55:59,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:56:03,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:56:03,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 93%|███████████████████████████████████████████████████████████████████████▌ | 2072/2230 [6:56:50<26:53, 10.21s/it] Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 93%|███████████████████████████████████████████████████████████████████████▌ | 2072/2230 [6:56:50<26:53, 10.21s/it] Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:56:09,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:56:09,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:56:13,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:56:13,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:56:16,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:56:16,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:56:20,382 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:56:20,382 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:56:24,186 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:56:24,186 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:56:26,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:56:26,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:56:30,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:56:32,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-23 23:56:32,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0569, 'learning_rate': 2.757225433526011e-05, 'epoch': 4.65} [WARNING|modeling_utils.py:388] 2022-03-23 23:56:36,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:56:38,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:56:40,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:56:40,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:56:42,686 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:56:44,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:56:46,479 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:56:48,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:56:48,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:56:50,225 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:56:52,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:56:53,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:56:55,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:56:55,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:56:58,816 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:00,392 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:01,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:01,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:05,027 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:06,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:07,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:07,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:10,533 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:12,955 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:12,955 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:15,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:17,399 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:17,399 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:19,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:21,263 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:21,263 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:23,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:24,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:24,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:26,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:26,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:30,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:30,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:33,968 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:37,527 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:37,527 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:37,527 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:41,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:41,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:44,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:44,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:48,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:51,461 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:51,461 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:51,461 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:55,027 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:55,027 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:57:58,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:01,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:01,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:05,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:05,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:05,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:08,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:12,088 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:12,088 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0747, 'learning_rate': 2.5317919075144507e-05, 'epoch': 4.68} [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0849, 'learning_rate': 2.514450867052023e-05, 'epoch': 4.68} [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0875, 'learning_rate': 2.497109826589595e-05, 'epoch': 4.69} [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1031, 'learning_rate': 2.4797687861271675e-05, 'epoch': 4.69} [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.1113, 'learning_rate': 2.4624277456647396e-05, 'epoch': 4.69} [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0534, 'learning_rate': 2.445086705202312e-05, 'epoch': 4.69} [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0803, 'learning_rate': 2.427745664739884e-05, 'epoch': 4.7} [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0649, 'learning_rate': 2.4104046242774565e-05, 'epoch': 4.7} 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0756, 'learning_rate': 2.393063583815029e-05, 'epoch': 4.7} 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0671, 'learning_rate': 2.375722543352601e-05, 'epoch': 4.7} 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0649, 'learning_rate': 2.3583815028901734e-05, 'epoch': 4.7} 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0595, 'learning_rate': 2.3410404624277454e-05, 'epoch': 4.71} 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0871, 'learning_rate': 2.323699421965318e-05, 'epoch': 4.71} 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0608, 'learning_rate': 2.30635838150289e-05, 'epoch': 4.71} 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0828, 'learning_rate': 2.2890173410404623e-05, 'epoch': 4.71} [WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0578, 'learning_rate': 2.2716763005780347e-05, 'epoch': 4.72} [WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0769, 'learning_rate': 2.2543352601156068e-05, 'epoch': 4.72} 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0526, 'learning_rate': 2.2369942196531792e-05, 'epoch': 4.72} 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0487, 'learning_rate': 2.2196531791907513e-05, 'epoch': 4.72} 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0588, 'learning_rate': 2.2023121387283237e-05, 'epoch': 4.72} 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0664, 'learning_rate': 2.184971098265896e-05, 'epoch': 4.73} 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0583, 'learning_rate': 2.167630057803468e-05, 'epoch': 4.73} [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0539, 'learning_rate': 2.1502890173410405e-05, 'epoch': 4.73} [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0558, 'learning_rate': 2.1329479768786126e-05, 'epoch': 4.73} [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0781, 'learning_rate': 2.115606936416185e-05, 'epoch': 4.74} [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0674, 'learning_rate': 2.098265895953757e-05, 'epoch': 4.74} [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:03:40,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:03:40,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:03:40,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:03:40,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0598, 'learning_rate': 2.0635838150289012e-05, 'epoch': 4.74} [WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0864, 'learning_rate': 2.0462427745664736e-05, 'epoch': 4.74} [WARNING|modeling_utils.py:388] 2022-03-24 00:04:12,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:04:12,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:04:16,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:04:16,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0483, 'learning_rate': 2.028901734104046e-05, 'epoch': 4.75} [WARNING|modeling_utils.py:388] 2022-03-24 00:04:20,063 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:04:20,063 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:04:20,063 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:04:20,063 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:04:20,063 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 95%|█████████████████████████████████████████████████████████████████████████▏ | 2118/2230 [7:05:12<20:33, 11.01s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:04:30,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:04:30,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:04:30,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:04:30,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:04:30,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:04:48,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:04:48,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0706, 'learning_rate': 1.9768786127167626e-05, 'epoch': 4.75} [WARNING|modeling_utils.py:388] 2022-03-24 00:04:48,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:04:54,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:04:54,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 95%|█████████████████████████████████████████████████████████████████████████▏ | 2121/2230 [7:05:43<19:02, 10.48s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 95%|█████████████████████████████████████████████████████████████████████████▏ | 2121/2230 [7:05:43<19:02, 10.48s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:01,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:01,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:01,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:07,168 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:07,168 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0448, 'learning_rate': 1.942196531791907e-05, 'epoch': 4.76} [WARNING|modeling_utils.py:388] 2022-03-24 00:05:07,168 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:13,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:13,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:16,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:16,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:19,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:19,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:05:23,272 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:05:25,487 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 95%|█████████████████████████████████████████████████████████████████████████▎ | 2124/2230 [7:06:11<17:14, 9.76s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 95%|█████████████████████████████████████████████████████████████████████████▎ | 2124/2230 [7:06:11<17:14, 9.76s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:05:29,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:05:29,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:33,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:33,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:33,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0411, 'learning_rate': 1.890173410404624e-05, 'epoch': 4.76} [WARNING|modeling_utils.py:388] 2022-03-24 00:05:39,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:41,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:43,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:43,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:45,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:47,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:48,820 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:50,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:50,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:52,475 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:55,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:57,520 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:57,520 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:05:59,225 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:00,795 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:03,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:03,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:05,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:08,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:09,285 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:09,285 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:11,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:13,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:13,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:15,469 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:17,622 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:19,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:19,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:21,542 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:21,542 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:24,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:25,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:25,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:27,989 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:27,989 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:31,545 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:31,545 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:35,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:38,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:38,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:38,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:42,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:42,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:45,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:48,946 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:48,946 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:52,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:52,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:52,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:55,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:55,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:06:59,225 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:02,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:02,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:06,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:06,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:06,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:09,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:12,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:12,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:16,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:16,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0863, 'learning_rate': 1.6647398843930635e-05, 'epoch': 4.79} [WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.085, 'learning_rate': 1.6473988439306356e-05, 'epoch': 4.8} [WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0834, 'learning_rate': 1.630057803468208e-05, 'epoch': 4.8} [WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0748, 'learning_rate': 1.61271676300578e-05, 'epoch': 4.8} 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0827, 'learning_rate': 1.5953757225433525e-05, 'epoch': 4.8} 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0786, 'learning_rate': 1.578034682080925e-05, 'epoch': 4.8} 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0794, 'learning_rate': 1.560693641618497e-05, 'epoch': 4.81} 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0692, 'learning_rate': 1.5433526011560694e-05, 'epoch': 4.81} 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.091, 'learning_rate': 1.5260115606936414e-05, 'epoch': 4.81} 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0572, 'learning_rate': 1.5086705202312138e-05, 'epoch': 4.81} 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0749, 'learning_rate': 1.491329479768786e-05, 'epoch': 4.82} 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0841, 'learning_rate': 1.4739884393063583e-05, 'epoch': 4.82} 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.07, 'learning_rate': 1.4566473988439304e-05, 'epoch': 4.82} 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0935, 'learning_rate': 1.4393063583815026e-05, 'epoch': 4.82} 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0645, 'learning_rate': 1.4219653179190749e-05, 'epoch': 4.83} 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.4197, 'learning_rate': 1.4046242774566473e-05, 'epoch': 4.83} 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0902, 'learning_rate': 1.3872832369942195e-05, 'epoch': 4.83} 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0462, 'learning_rate': 1.3699421965317917e-05, 'epoch': 4.83} 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0749, 'learning_rate': 1.352601156069364e-05, 'epoch': 4.83} 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0734, 'learning_rate': 1.3179190751445084e-05, 'epoch': 4.84} 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0546, 'learning_rate': 1.3005780346820809e-05, 'epoch': 4.84} 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0643, 'learning_rate': 1.2832369942196531e-05, 'epoch': 4.84} 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0613, 'learning_rate': 1.2658959537572253e-05, 'epoch': 4.85} 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0652, 'learning_rate': 1.2485549132947976e-05, 'epoch': 4.85} 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0587, 'learning_rate': 1.2312138728323698e-05, 'epoch': 4.85} 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0617, 'learning_rate': 1.213872832369942e-05, 'epoch': 4.85} 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▊ | 2165/2230 [7:13:39<12:23, 11.44s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▊ | 2165/2230 [7:13:39<12:23, 11.44s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0527, 'learning_rate': 1.1965317919075144e-05, 'epoch': 4.85} [WARNING|modeling_utils.py:388] 2022-03-24 00:12:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:12:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:12:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:12:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:12:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:12:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0402, 'learning_rate': 1.1791907514450867e-05, 'epoch': 4.86} [WARNING|modeling_utils.py:388] 2022-03-24 00:12:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:12:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:12:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:12:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:12:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0552, 'learning_rate': 1.161849710982659e-05, 'epoch': 4.86} [WARNING|modeling_bart.py:1051] 2022-03-24 00:13:21,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:13:21,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:13:26,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:13:26,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0607, 'learning_rate': 1.1445086705202312e-05, 'epoch': 4.86} [WARNING|modeling_utils.py:388] 2022-03-24 00:13:26,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:13:26,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:13:33,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:13:33,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:13:33,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▉ | 2169/2230 [7:14:22<10:55, 10.74s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▉ | 2169/2230 [7:14:22<10:55, 10.74s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▉ | 2169/2230 [7:14:22<10:55, 10.74s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:13:44,273 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:13:44,273 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:13:44,273 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [7:14:32<10:35, 10.58s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [7:14:32<10:35, 10.58s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:13:52,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:13:52,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:13:52,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:13:52,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [7:14:42<10:13, 10.40s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:13:58,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|██████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [7:14:42<10:13, 10.40s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:13:58,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:14:02,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:13:58,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:14:02,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:13:58,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:14:07,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:13:58,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:14:07,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:13:58,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0433, 'learning_rate': 1.0751445086705203e-05, 'epoch': 4.87} [WARNING|modeling_utils.py:388] 2022-03-24 00:14:11,034 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:13:58,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:14:13,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:13:58,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:14:13,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:13:58,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|███████████████████████████████████████████████████████████████████████████ | 2173/2230 [7:15:01<09:22, 9.86s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 97%|███████████████████████████████████████████████████████████████████████████ | 2173/2230 [7:15:01<09:22, 9.86s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0471, 'learning_rate': 1.0578034682080925e-05, 'epoch': 4.87} 97%|███████████████████████████████████████████████████████████████████████████ | 2173/2230 [7:15:01<09:22, 9.86s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:14:23,030 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:14:25,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:14:25,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.043, 'learning_rate': 1.0404624277456646e-05, 'epoch': 4.87} [WARNING|modeling_utils.py:388] 2022-03-24 00:14:28,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:14:30,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:14:32,946 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:14:32,946 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0672, 'learning_rate': 1.0231213872832368e-05, 'epoch': 4.88} [WARNING|modeling_bart.py:1051] 2022-03-24 00:14:37,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:14:39,046 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:14:40,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:14:40,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [7:15:26<08:00, 8.89s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:14:42,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:14:44,746 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:42,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:14:46,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:42,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▏ | 2177/2230 [7:15:33<07:24, 8.38s/it] Setting `use_cache=False`...1] 2022-03-24 00:14:42,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▏ | 2177/2230 [7:15:33<07:24, 8.38s/it] Setting `use_cache=False`...1] 2022-03-24 00:14:42,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:14:51,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:50,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:14:53,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:50,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:14:54,916 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:50,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:14:54,916 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:50,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▏ | 2178/2230 [7:15:40<06:47, 7.84s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:14:56,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:14:59,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:56,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:15:01,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:56,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:15:01,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:56,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▏ | 2179/2230 [7:15:46<06:12, 7.30s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:15:02,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:15:05,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:02,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [7:15:51<05:34, 6.70s/it] Setting `use_cache=False`...1] 2022-03-24 00:15:02,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [7:15:51<05:34, 6.70s/it] Setting `use_cache=False`...1] 2022-03-24 00:15:02,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:15:09,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:07,802 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:15:11,394 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:07,802 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:15:11,394 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:07,802 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [7:15:56<05:01, 6.15s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:15:12,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:15:14,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:12,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:15:14,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:12,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [7:16:01<04:29, 5.62s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:15:16,960 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:15:19,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:16,960 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:15:19,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:16,960 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:15:21,463 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:20,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▍ | 2184/2230 [7:16:08<03:26, 4.48s/it] Setting `use_cache=False`...1] 2022-03-24 00:15:20,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▍ | 2184/2230 [7:16:08<03:26, 4.48s/it] Setting `use_cache=False`...1] 2022-03-24 00:15:20,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▍ | 2184/2230 [7:16:08<03:26, 4.48s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:15:24,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▍ | 2184/2230 [7:16:08<03:26, 4.48s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:15:24,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:15:28,468 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:24,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:15:31,920 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:24,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:15:31,920 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:24,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:15:35,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:24,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▍ | 2185/2230 [7:16:22<05:34, 7.43s/it] Setting `use_cache=False`...1] 2022-03-24 00:15:24,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▍ | 2185/2230 [7:16:22<05:34, 7.43s/it] Setting `use_cache=False`...1] 2022-03-24 00:15:24,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▍ | 2185/2230 [7:16:22<05:34, 7.43s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:15:38,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:15:42,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:38,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:15:42,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:38,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:15:45,802 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:38,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:15:45,802 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:38,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:15:49,214 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:38,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▍ | 2186/2230 [7:16:36<06:50, 9.32s/it] Setting `use_cache=False`...1] 2022-03-24 00:15:38,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▍ | 2186/2230 [7:16:36<06:50, 9.32s/it] Setting `use_cache=False`...1] 2022-03-24 00:15:38,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▍ | 2186/2230 [7:16:36<06:50, 9.32s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:15:52,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:15:56,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:52,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:15:56,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:52,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:15:59,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:52,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:16:02,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:52,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:16:02,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:52,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▌ | 2187/2230 [7:16:49<07:34, 10.58s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:16:06,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▌ | 2187/2230 [7:16:49<07:34, 10.58s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:16:06,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0904, 'learning_rate': 8.15028901734104e-06, 'epoch': 4.9} [WARNING|modeling_bart.py:1051] 2022-03-24 00:16:09,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:06,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:16:09,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:06,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:16:13,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:06,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:16:16,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:06,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▌ | 2188/2230 [7:17:03<08:04, 11.54s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:06,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▌ | 2188/2230 [7:17:03<08:04, 11.54s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:06,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▌ | 2188/2230 [7:17:03<08:04, 11.54s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0813, 'learning_rate': 7.803468208092485e-06, 'epoch': 4.91} [WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0839, 'learning_rate': 7.45664739884393e-06, 'epoch': 4.91} 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.085, 'learning_rate': 7.109826589595374e-06, 'epoch': 4.92} 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0905, 'learning_rate': 6.9364161849710975e-06, 'epoch': 4.92} 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0731, 'learning_rate': 6.76300578034682e-06, 'epoch': 4.92} 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.069, 'learning_rate': 6.589595375722542e-06, 'epoch': 4.92} 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0554, 'learning_rate': 6.4161849710982654e-06, 'epoch': 4.93} 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0552, 'learning_rate': 6.242774566473988e-06, 'epoch': 4.93} 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0705, 'learning_rate': 6.06936416184971e-06, 'epoch': 4.93} 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0627, 'learning_rate': 5.895953757225433e-06, 'epoch': 4.93} 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0784, 'learning_rate': 5.722543352601156e-06, 'epoch': 4.93} 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0665, 'learning_rate': 5.549132947976878e-06, 'epoch': 4.94} 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0605, 'learning_rate': 5.375722543352601e-06, 'epoch': 4.94} 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0544, 'learning_rate': 5.202312138728323e-06, 'epoch': 4.94} 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0643, 'learning_rate': 5.028901734104045e-06, 'epoch': 4.94} 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:20:50<04:50, 12.10s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:20:50<04:50, 12.10s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:20:50<04:50, 12.10s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:20:50<04:50, 12.10s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:20:50<04:50, 12.10s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:20:50<04:50, 12.10s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:20:50<04:50, 12.10s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0734, 'learning_rate': 4.682080924855491e-06, 'epoch': 4.95} [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0615, 'learning_rate': 4.508670520231213e-06, 'epoch': 4.95} [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.053, 'learning_rate': 4.335260115606936e-06, 'epoch': 4.95} [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0534, 'learning_rate': 4.161849710982659e-06, 'epoch': 4.96} [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0597, 'learning_rate': 3.988439306358381e-06, 'epoch': 4.96} [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0839, 'learning_rate': 3.8150289017341036e-06, 'epoch': 4.96} [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0639, 'learning_rate': 3.641618497109826e-06, 'epoch': 4.96} [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:21:34,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:21:34,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:21:34,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:21:34,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0344, 'learning_rate': 3.4682080924855487e-06, 'epoch': 4.96} [WARNING|modeling_bart.py:1051] 2022-03-24 00:21:34,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:21:34,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:21:34,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 99%|████████████████████████████████████████████████████████████████████████████▍| 2215/2230 [7:22:33<02:47, 11.18s/it] Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 99%|████████████████████████████████████████████████████████████████████████████▍| 2215/2230 [7:22:33<02:47, 11.18s/it] Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0682, 'learning_rate': 3.294797687861271e-06, 'epoch': 4.97} 99%|████████████████████████████████████████████████████████████████████████████▍| 2215/2230 [7:22:33<02:47, 11.18s/it] Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 99%|████████████████████████████████████████████████████████████████████████████▍| 2215/2230 [7:22:33<02:47, 11.18s/it] Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 99%|████████████████████████████████████████████████████████████████████████████▍| 2215/2230 [7:22:33<02:47, 11.18s/it] Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:21:58,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:21:58,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0535, 'learning_rate': 3.121387283236994e-06, 'epoch': 4.97} [WARNING|modeling_utils.py:388] 2022-03-24 00:22:02,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:22:02,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:22:02,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:22:09,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:22:09,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.06, 'learning_rate': 2.9479768786127167e-06, 'epoch': 4.97} [WARNING|modeling_utils.py:388] 2022-03-24 00:22:09,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:22:09,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:22:17,488 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 99%|████████████████████████████████████████████████████████████████████████████▌| 2218/2230 [7:23:03<02:06, 10.53s/it] Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 99%|████████████████████████████████████████████████████████████████████████████▌| 2218/2230 [7:23:03<02:06, 10.53s/it] Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0719, 'learning_rate': 2.774566473988439e-06, 'epoch': 4.97} [WARNING|modeling_bart.py:1051] 2022-03-24 00:22:23,696 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:22:23,696 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:22:27,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:22:27,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:22:27,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0284, 'learning_rate': 2.6011560693641614e-06, 'epoch': 4.98} [WARNING|modeling_utils.py:388] 2022-03-24 00:22:33,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:22:35,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:22:35,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:22:35,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:22:40,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:22:40,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:22:43,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:22:46,117 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:22:46,117 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:22:48,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_utils.py:388] 2022-03-24 00:22:48,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:22:52,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:22:54,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:22:54,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 100%|████████████████████████████████████████████████████████████████████████████▋| 2222/2230 [7:23:40<01:14, 9.27s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:22:56,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:22:58,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:22:56,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:23:00,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:22:56,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:23:01,928 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:22:56,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:23:01,928 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:22:56,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 100%|████████████████████████████████████████████████████████████████████████████▊| 2223/2230 [7:23:47<01:01, 8.79s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:23:03,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:23:05,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:03,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:23:07,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:03,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:23:07,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:03,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 100%|████████████████████████████████████████████████████████████████████████████▊| 2224/2230 [7:23:54<00:49, 8.29s/it] Setting `use_cache=False`...1] 2022-03-24 00:23:03,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:23:12,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:10,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:23:14,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:10,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:23:15,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:10,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:23:15,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:10,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 100%|████████████████████████████████████████████████████████████████████████████▊| 2225/2230 [7:24:01<00:39, 7.85s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:23:17,714 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:23:20,561 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:17,714 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:23:21,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:17,714 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:23:21,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:17,714 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:23:24,513 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:23,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:23:26,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:23,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:23:26,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:23,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 100%|████████████████████████████████████████████████████████████████████████████▉| 2227/2230 [7:24:12<00:19, 6.49s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:23:28,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:23:30,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:28,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:23:30,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:28,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 100%|████████████████████████████████████████████████████████████████████████████▉| 2228/2230 [7:24:16<00:11, 5.80s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:23:32,199 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:23:34,894 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:32,199 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:23:34,894 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:32,199 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [WARNING|modeling_bart.py:1051] 2022-03-24 00:23:36,676 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:35,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 100%|█████████████████████████████████████████████████████████████████████████████| 2230/2230 [7:24:23<00:00, 4.53s/it][INFO|trainer.py:1492] 2022-03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 100%|█████████████████████████████████████████████████████████████████████████████| 2230/2230 [7:24:23<00:00, 4.53s/it][INFO|trainer.py:1492] 2022-03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'loss': 0.0607, 'learning_rate': 6.936416184971098e-07, 'epoch': 5.0} [INFO|modeling_utils.py:1081] 2022-03-24 00:23:50,453 >> Model weights saved in ./pytorch_model.bin:23<00:00, 4.53s/it][INFO|trainer.py:1492] 2022-03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|modeling_utils.py:1081] 2022-03-24 00:24:02,084 >> Model weights saved in ./pytorch_model.bin:23<00:00, 4.53s/it][INFO|trainer.py:1492] 2022-03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 9%|█▏ | 32.0k/352k [00:00> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 100%|██████| 352k/352k [01:52<00:00, 2.92kB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file runs/Mar23_16-58-49_sanchit--v100/events.out.tfevents.1648054754.sanchit--v100.1749532.0: 100%|██████| 352k/352k [01:52<00:00, 2.92kB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 03/24/2022 00:27:04 - WARNING - huggingface_hub.repository - remote: tput: No value for $TERM and no -T specified remote: tput: No value for $TERM and no -T specified remote: tput: No value for $TERM and no -T specified remote: tput: No value for $TERM and no -T specified To https://huggingface.co/sanchit-gandhi/wav2vec2-2-bart-large-cnn {'dataset': {'name': 'librispeech_asr', 'type': 'librispeech_asr', 'args': 'clean'}}--v100.1749532.0: 100%|██████| 352k/352k [01:52<00:00, 2.92kB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'dataset': {'name': 'librispeech_asr', 'type': 'librispeech_asr', 'args': 'clean'}}--v100.1749532.0: 100%|██████| 352k/352k [01:52<00:00, 2.92kB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'dataset': {'name': 'librispeech_asr', 'type': 'librispeech_asr', 'args': 'clean'}}--v100.1749532.0: 100%|██████| 352k/352k [01:52<00:00, 2.92kB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'dataset': {'name': 'librispeech_asr', 'type': 'librispeech_asr', 'args': 'clean'}}--v100.1749532.0: 100%|██████| 352k/352k [01:52<00:00, 2.92kB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'dataset': {'name': 'librispeech_asr', 'type': 'librispeech_asr', 'args': 'clean'}}--v100.1749532.0: 100%|██████| 352k/352k [01:52<00:00, 2.92kB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. {'dataset': {'name': 'librispeech_asr', 'type': 'librispeech_asr', 'args': 'clean'}}--v100.1749532.0: 100%|██████| 352k/352k [01:52<00:00, 2.92kB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. remote: tput: No value for $TERM and no -T specified _asr', 'args': 'clean'}}--v100.1749532.0: 100%|██████| 352k/352k [01:52<00:00, 2.92kB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. remote: tput: No value for $TERM and no -T specified _asr', 'args': 'clean'}}--v100.1749532.0: 100%|██████| 352k/352k [01:52<00:00, 2.92kB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 03/24/2022 00:27:23 - WARNING - huggingface_hub.repository - remote: tput: No value for $TERM and no -T specified remote: tput: No value for $TERM and no -T specified remote: tput: No value for $TERM and no -T specified remote: tput: No value for $TERM and no -T specified To https://huggingface.co/sanchit-gandhi/wav2vec2-2-bart-large-cnn Upload file wandb/run-20220323_165914-1vl16ira/run-1vl16ira.wandb: 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Upload file wandb/run-20220323_165914-1vl16ira/run-1vl16ira.wandb: 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. ***** train metrics ***** epoch = 5.0 train_loss = 1.6267 train_runtime = 7:24:24.66 train_samples = 28538 train_samples_per_second = 5.351 train_steps_per_second = 0.084 [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 03/24/2022 00:36:44 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow ***** eval metrics ***** epoch = 5.0 eval_loss = 0.3226 eval_runtime = 0:09:18.18 eval_samples = 2642 eval_samples_per_second = 4.733 eval_steps_per_second = 0.593 eval_wer = 0.0924 [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. [INFO|trainer.py:2369] 2022-03-24 00:27:26,021 >> Batch size = 8 100%|█████████████████████████████████████████| 219M/219M [00:11<00:00, 20.8MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. 03/24/2022 00:37:28 - WARNING - huggingface_hub.repository - remote: tput: No value for $TERM and no -T specified remote: tput: No value for $TERM and no -T specified remote: tput: No value for $TERM and no -T specified remote: tput: No value for $TERM and no -T specified To https://huggingface.co/sanchit-gandhi/wav2vec2-2-bart-large-cnn Upload file wandb/run-20220323_165914-1vl16ira/run-1vl16ira.wandb: 100%|█████████████████████████████████████████| 220M/220M [00:11<00:00, 20.9MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. File "/home/sanchit_huggingface_co/gcp/lib/python3.9/site-packages/huggingface_hub/hf_api.py", line 870, in model_info220M [00:11<00:00, 20.9MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. File "/home/sanchit_huggingface_co/gcp/lib/python3.9/site-packages/huggingface_hub/hf_api.py", line 870, in model_info220M [00:11<00:00, 20.9MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. File "/home/sanchit_huggingface_co/gcp/lib/python3.9/site-packages/huggingface_hub/hf_api.py", line 870, in model_info220M [00:11<00:00, 20.9MB/s]03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message.