diff --git "a/wandb/run-20220323_165914-1vl16ira/files/output.log" "b/wandb/run-20220323_165914-1vl16ira/files/output.log" --- "a/wandb/run-20220323_165914-1vl16ira/files/output.log" +++ "b/wandb/run-20220323_165914-1vl16ira/files/output.log" @@ -13857,3 +13857,1436 @@ 03/23/2022 23:41:07 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow {'eval_loss': 0.32974740862846375, 'eval_wer': 0.09492264974216581, 'eval_runtime': 555.2371, 'eval_samples_per_second': 4.758, 'eval_steps_per_second': 0.596, 'epoch': 4.48} 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0764, 'learning_rate': 4.040462427745664e-05, 'epoch': 4.49} + 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▋ | 2/331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0685, 'learning_rate': 4.023121387283236e-05, 'epoch': 4.49} + 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.096, 'learning_rate': 4.005780346820808e-05, 'epoch': 4.49} + 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0908, 'learning_rate': 3.988439306358381e-05, 'epoch': 4.49} + 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████▍ | 2002/2230 [6:43:55<9:30:10, 150.05s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0824, 'learning_rate': 3.953757225433525e-05, 'epoch': 4.5} + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0638, 'learning_rate': 3.936416184971098e-05, 'epoch': 4.5} + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0703, 'learning_rate': 3.91907514450867e-05, 'epoch': 4.5} + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0662, 'learning_rate': 3.901734104046242e-05, 'epoch': 4.5} + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████��███████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0581, 'learning_rate': 3.884393063583814e-05, 'epoch': 4.51} + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0755, 'learning_rate': 3.867052023121387e-05, 'epoch': 4.51} + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████▍ | 2005/2230 [6:44:34<3:44:56, 59.98s/it]331 [00:01<02:58, 1.84it/s].e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:45:16,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:45:16,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0439, 'learning_rate': 3.849710982658959e-05, 'epoch': 4.51} +[WARNING|modeling_utils.py:388] 2022-03-23 23:45:16,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:45:16,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:45:16,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:45:16,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|█████████████████████████████��███████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0588, 'learning_rate': 3.832369942196531e-05, 'epoch': 4.51} + 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0649, 'learning_rate': 3.815028901734104e-05, 'epoch': 4.52} + 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|█████████████████████████████████████████████████████████████████████▌ | 2013/2230 [6:46:12<53:14, 14.72s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:45:53,461 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:45:53,461 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.071, 'learning_rate': 3.780346820809248e-05, 'epoch': 4.52} +[WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.09, 'learning_rate': 3.76300578034682e-05, 'epoch': 4.52} +[WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:45:57,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-23 23:46:21,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|█████████████████████████████████████████████████████████████████████▋ | 2018/2230 [6:47:08<41:05, 11.63s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|█████████████████████████████████████████████████████████████████████▋ | 2018/2230 [6:47:08<41:05, 11.63s/it] Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0448, 'learning_rate': 3.745664739884393e-05, 'epoch': 4.52} +[WARNING|modeling_utils.py:388] 2022-03-23 23:46:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:46:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:46:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:46:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:46:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0613, 'learning_rate': 3.728323699421965e-05, 'epoch': 4.53} +[WARNING|modeling_utils.py:388] 2022-03-23 23:46:37,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:46:37,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:46:37,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:46:37,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:46:37,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-23 23:46:46,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-23 23:46:46,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:46:50,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:46:50,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:46:50,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 91%|█████████████████████████████████████████████████████████████████████▊ | 2021/2230 [6:47:39<37:26, 10.75s/it]g-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:46:57,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:46:57,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:46:57,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:03,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:03,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0628, 'learning_rate': 3.6763005780346816e-05, 'epoch': 4.53} +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:03,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:09,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:09,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-23 23:47:13,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-23 23:47:13,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0486, 'learning_rate': 3.658959537572254e-05, 'epoch': 4.54} +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:17,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:19,684 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:19,684 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:27:22,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 91%|█████████████████████████████████████████████████████████████████████▉ | 2024/2230 [6:48:07<33:53, 9.87s/it][WARNING|modeling_bart.py:1051] 2022-03-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 91%|█████████████████████████████████████████████████████████████████████▉ | 2024/2230 [6:48:07<33:53, 9.87s/it][WARNING|modeling_bart.py:1051] 2022-03-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0715, 'learning_rate': 3.6416184971098265e-05, 'epoch': 4.54} +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:27,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:29,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:29,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:29,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0677, 'learning_rate': 3.6242774566473985e-05, 'epoch': 4.54} +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:35,251 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:37,228 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:39,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:39,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:41,156 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:43,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:44,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:46,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:46,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:48,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:52,050 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:53,678 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:53,678 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:55,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:56,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:59,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:47:59,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:01,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:04,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:05,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:05,378 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:07,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:09,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:09,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:11,363 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:13,431 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:13,431 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:15,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:18,063 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:18,063 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:19,820 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:21,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:21,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:23,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:23,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:27,196 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:27,196 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:30,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:34,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:34,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:34,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:37,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:37,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:41,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:41,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:44,798 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:48,230 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:48,230 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:48,230 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:51,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:51,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:55,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:58,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:48:58,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:02,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:02,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.1258, 'learning_rate': 3.4161849710982654e-05, 'epoch': 4.57} +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:05,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:08,859 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:08,859 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:12,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:12,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:15,935 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:15,935 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.1049, 'learning_rate': 3.38150289017341e-05, 'epoch': 4.57} +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0621, 'learning_rate': 3.364161849710982e-05, 'epoch': 4.57} +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0679, 'learning_rate': 3.346820809248554e-05, 'epoch': 4.58} +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0786, 'learning_rate': 3.329479768786127e-05, 'epoch': 4.58} +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0961, 'learning_rate': 3.312138728323699e-05, 'epoch': 4.58} +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:49:19,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.098, 'learning_rate': 3.294797687861271e-05, 'epoch': 4.58} + 92%|██���███████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0582, 'learning_rate': 3.277456647398844e-05, 'epoch': 4.59} + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0676, 'learning_rate': 3.260115606936416e-05, 'epoch': 4.59} + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0756, 'learning_rate': 3.242774566473988e-05, 'epoch': 4.59} + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0821, 'learning_rate': 3.22543352601156e-05, 'epoch': 4.59} + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|███████���██████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0867, 'learning_rate': 3.208092485549133e-05, 'epoch': 4.59} + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0538, 'learning_rate': 3.190751445086705e-05, 'epoch': 4.6} + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▌ | 2044/2230 [6:51:21<40:16, 12.99s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0831, 'learning_rate': 3.173410404624277e-05, 'epoch': 4.6} + 92%|███████████████████████████████████████████████████████████████████���██▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0608, 'learning_rate': 3.15606936416185e-05, 'epoch': 4.6} + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|█████████████���████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.067, 'learning_rate': 3.138728323699422e-05, 'epoch': 4.6} + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0615, 'learning_rate': 3.121387283236994e-05, 'epoch': 4.61} + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0613, 'learning_rate': 3.1040462427745667e-05, 'epoch': 4.61} + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0709, 'learning_rate': 3.086705202312139e-05, 'epoch': 4.61} + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|██████████████████████████████████████████████████████████████████████▊ | 2051/2230 [6:52:50<37:56, 12.72s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0792, 'learning_rate': 3.069364161849711e-05, 'epoch': 4.61} + g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0584, 'learning_rate': 3.052023121387283e-05, 'epoch': 4.61} + g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.071, 'learning_rate': 3.0346820809248553e-05, 'epoch': 4.62} + g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0718, 'learning_rate': 3.0173410404624277e-05, 'epoch': 4.62} + g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|███████████████████████████████████████████████████████████████████████▏ | 2061/2230 [6:54:51<33:29, 11.89s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|███████████████████████████████████████████████████████████████████████▏ | 2061/2230 [6:54:51<33:29, 11.89s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0671, 'learning_rate': 2.9999999999999997e-05, 'epoch': 4.62} + 92%|███████████████████████████████████████████████████████████████████████▏ | 2061/2230 [6:54:51<33:29, 11.89s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|███████████████████████████████████████████████████████████████████████▏ | 2061/2230 [6:54:51<33:29, 11.89s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|███████████████████████████████████████████████████████████████████████▏ | 2061/2230 [6:54:51<33:29, 11.89s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|███████████████████████████████████████████████████████████████████████▏ | 2061/2230 [6:54:51<33:29, 11.89s/it]g-point operations will not be computed-23 23:47:23,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [6:55:03<33:01, 11.80s/it][WARNING|modeling_bart.py:1051] 2022-03-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [6:55:03<33:01, 11.80s/it][WARNING|modeling_bart.py:1051] 2022-03-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0882, 'learning_rate': 2.982658959537572e-05, 'epoch': 4.62} + 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [6:55:03<33:01, 11.80s/it][WARNING|modeling_bart.py:1051] 2022-03-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [6:55:03<33:01, 11.80s/it][WARNING|modeling_bart.py:1051] 2022-03-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [6:55:03<33:01, 11.80s/it][WARNING|modeling_bart.py:1051] 2022-03-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [6:55:03<33:01, 11.80s/it][WARNING|modeling_bart.py:1051] 2022-03-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|███████████████████████████████████████████████████████████████████████▏ | 2062/2230 [6:55:03<33:01, 11.80s/it][WARNING|modeling_bart.py:1051] 2022-03-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0676, 'learning_rate': 2.9653179190751446e-05, 'epoch': 4.63} +[WARNING|modeling_utils.py:388] 2022-03-23 23:54:33,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:54:33,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:54:33,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:54:33,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:54:33,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:54:33,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0508, 'learning_rate': 2.9479768786127166e-05, 'epoch': 4.63} +[WARNING|modeling_utils.py:388] 2022-03-23 23:54:33,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:54:33,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:54:33,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-23 23:54:52,432 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-23 23:54:52,432 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-23 23:54:52,432 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-23 23:54:56,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-23 23:54:56,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-23 23:54:56,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-23 23:54:56,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|███████████████████████████████████████████████████████████████████████▎ | 2066/2230 [6:55:48<31:02, 11.36s/it] Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|███████████████████████████████████████████████████████████████████████▎ | 2066/2230 [6:55:48<31:02, 11.36s/it] Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0594, 'learning_rate': 2.9132947976878608e-05, 'epoch': 4.63} + 93%|███████████████████████████████████████████████████████████████████████▎ | 2066/2230 [6:55:48<31:02, 11.36s/it] Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:10,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:10,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:14,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:14,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.071, 'learning_rate': 2.895953757225433e-05, 'epoch': 4.63} +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:14,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:14,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:14,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:25,251 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:25,251 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.053, 'learning_rate': 2.8786127167630052e-05, 'epoch': 4.64} +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:29,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:29,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:29,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:29,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:29,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0668, 'learning_rate': 2.8612716763005776e-05, 'epoch': 4.64} +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:39,546 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:39,546 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:39,546 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:45,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:45,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:45,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.054, 'learning_rate': 2.8439306358381497e-05, 'epoch': 4.64} +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:45,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:53,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:53,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:53,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:53,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:59,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:55:59,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-23 23:56:03,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-23 23:56:03,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|███████████████████████████████████████████████████████████████████████▌ | 2072/2230 [6:56:50<26:53, 10.21s/it] Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|███████████████████████████████████████████████████████████████████████▌ | 2072/2230 [6:56:50<26:53, 10.21s/it] Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-23 23:56:09,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-23 23:56:09,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:56:13,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:56:13,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:56:16,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:56:16,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-23 23:56:20,382 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-23 23:56:20,382 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:56:24,186 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:56:24,186 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:56:26,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:56:26,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-23 23:56:30,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-23 23:56:32,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-23 23:56:32,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0569, 'learning_rate': 2.757225433526011e-05, 'epoch': 4.65} +[WARNING|modeling_utils.py:388] 2022-03-23 23:56:36,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:56:38,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:56:40,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:56:40,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:56:42,686 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:56:44,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:56:46,479 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:56:48,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:56:48,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:56:50,225 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:56:52,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:56:53,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:56:55,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:56:55,457 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:56:58,816 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:00,392 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:01,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:01,952 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:05,027 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:06,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:07,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:07,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:10,533 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:12,955 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:12,955 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:15,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:17,399 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:17,399 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:19,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:21,263 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:21,263 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:23,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:24,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:24,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:26,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:26,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:30,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:30,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:33,968 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:37,527 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:37,527 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:37,527 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:41,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:41,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:44,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:44,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:48,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:51,461 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:51,461 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:51,461 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:55,027 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:55,027 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:57:58,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:01,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:01,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:05,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:05,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:05,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:08,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:12,088 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:12,088 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0747, 'learning_rate': 2.5317919075144507e-05, 'epoch': 4.68} +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0849, 'learning_rate': 2.514450867052023e-05, 'epoch': 4.68} +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0875, 'learning_rate': 2.497109826589595e-05, 'epoch': 4.69} +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.1031, 'learning_rate': 2.4797687861271675e-05, 'epoch': 4.69} +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.1113, 'learning_rate': 2.4624277456647396e-05, 'epoch': 4.69} +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0534, 'learning_rate': 2.445086705202312e-05, 'epoch': 4.69} +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0803, 'learning_rate': 2.427745664739884e-05, 'epoch': 4.7} +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-23 23:58:15,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0649, 'learning_rate': 2.4104046242774565e-05, 'epoch': 4.7} + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████��█████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0756, 'learning_rate': 2.393063583815029e-05, 'epoch': 4.7} + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0671, 'learning_rate': 2.375722543352601e-05, 'epoch': 4.7} + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0649, 'learning_rate': 2.3583815028901734e-05, 'epoch': 4.7} + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▎ | 2095/2230 [7:00:37<29:07, 12.94s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0595, 'learning_rate': 2.3410404624277454e-05, 'epoch': 4.71} + 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0871, 'learning_rate': 2.323699421965318e-05, 'epoch': 4.71} + 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0608, 'learning_rate': 2.30635838150289e-05, 'epoch': 4.71} + 94%|████████████████████████████████████████████████████████████████████████▍ | 2099/2230 [7:01:28<27:48, 12.74s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0828, 'learning_rate': 2.2890173410404623e-05, 'epoch': 4.71} +[WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0578, 'learning_rate': 2.2716763005780347e-05, 'epoch': 4.72} +[WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:01:15,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0769, 'learning_rate': 2.2543352601156068e-05, 'epoch': 4.72} + 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0526, 'learning_rate': 2.2369942196531792e-05, 'epoch': 4.72} + 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▋ | 2104/2230 [7:02:30<26:08, 12.45s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0487, 'learning_rate': 2.2196531791907513e-05, 'epoch': 4.72} + 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0588, 'learning_rate': 2.2023121387283237e-05, 'epoch': 4.72} + 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0664, 'learning_rate': 2.184971098265896e-05, 'epoch': 4.73} + 94%|████████████████████████████████████████████████████████████████████████▋ | 2106/2230 [7:02:54<25:18, 12.25s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0583, 'learning_rate': 2.167630057803468e-05, 'epoch': 4.73} +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0539, 'learning_rate': 2.1502890173410405e-05, 'epoch': 4.73} +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0558, 'learning_rate': 2.1329479768786126e-05, 'epoch': 4.73} +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0781, 'learning_rate': 2.115606936416185e-05, 'epoch': 4.74} +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0674, 'learning_rate': 2.098265895953757e-05, 'epoch': 4.74} +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:02:39,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:03:40,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:03:40,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:03:40,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:03:40,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0598, 'learning_rate': 2.0635838150289012e-05, 'epoch': 4.74} +[WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:03:47,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0864, 'learning_rate': 2.0462427745664736e-05, 'epoch': 4.74} +[WARNING|modeling_utils.py:388] 2022-03-24 00:04:12,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:04:12,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:04:16,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:04:16,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0483, 'learning_rate': 2.028901734104046e-05, 'epoch': 4.75} +[WARNING|modeling_utils.py:388] 2022-03-24 00:04:20,063 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:04:20,063 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:04:20,063 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:04:20,063 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:04:20,063 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|█████████████████████████████████████████████████████████████████████████▏ | 2118/2230 [7:05:12<20:33, 11.01s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:04:30,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:04:30,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:04:30,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:04:30,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:04:30,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:04:48,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:04:48,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0706, 'learning_rate': 1.9768786127167626e-05, 'epoch': 4.75} +[WARNING|modeling_utils.py:388] 2022-03-24 00:04:48,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:04:54,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:04:54,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|█████████████████████████████████████████████████████████████████████████▏ | 2121/2230 [7:05:43<19:02, 10.48s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|█████████████████████████████████████████████████████████████████████████▏ | 2121/2230 [7:05:43<19:02, 10.48s/it]g-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:01,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:01,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:01,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:07,168 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:07,168 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0448, 'learning_rate': 1.942196531791907e-05, 'epoch': 4.76} +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:07,168 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:13,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:13,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:16,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:16,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:19,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:19,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:05:23,272 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:05:25,487 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-23 23:54:19,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|█████████████████████████████████████████████████████████████████████████▎ | 2124/2230 [7:06:11<17:14, 9.76s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|█████████████████████████████████████████████████████████████████████████▎ | 2124/2230 [7:06:11<17:14, 9.76s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:05:29,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:05:29,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:33,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:33,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:33,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0411, 'learning_rate': 1.890173410404624e-05, 'epoch': 4.76} +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:39,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:41,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:43,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:43,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:45,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:47,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:48,820 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:50,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:50,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:52,475 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:55,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:57,520 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:57,520 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:05:59,225 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:00,795 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:03,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:03,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:05,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:08,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:09,285 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:09,285 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:11,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:13,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:13,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:15,469 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:17,622 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:19,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:19,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:21,542 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:21,542 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:24,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:25,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:25,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:27,989 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:27,989 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:31,545 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:31,545 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:35,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:38,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:38,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:38,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:42,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:42,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:45,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:48,946 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:48,946 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:52,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:52,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:52,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:55,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:55,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:06:59,225 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:02,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:02,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:06,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:06,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:06,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:09,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:12,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:12,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:16,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:16,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0863, 'learning_rate': 1.6647398843930635e-05, 'epoch': 4.79} +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.085, 'learning_rate': 1.6473988439306356e-05, 'epoch': 4.8} +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0834, 'learning_rate': 1.630057803468208e-05, 'epoch': 4.8} +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:07:19,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██���██████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0748, 'learning_rate': 1.61271676300578e-05, 'epoch': 4.8} + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0827, 'learning_rate': 1.5953757225433525e-05, 'epoch': 4.8} + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0786, 'learning_rate': 1.578034682080925e-05, 'epoch': 4.8} + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|█████████████████████████████████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|████████████████████████████��████████████████████████████████████████████▉ | 2141/2230 [7:08:46<18:56, 12.76s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0794, 'learning_rate': 1.560693641618497e-05, 'epoch': 4.81} + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0692, 'learning_rate': 1.5433526011560694e-05, 'epoch': 4.81} + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.091, 'learning_rate': 1.5260115606936414e-05, 'epoch': 4.81} + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|█████████████████████████████████████████████████████��████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0572, 'learning_rate': 1.5086705202312138e-05, 'epoch': 4.81} + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0749, 'learning_rate': 1.491329479768786e-05, 'epoch': 4.82} + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 2144/2230 [7:09:25<18:35, 12.97s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0841, 'learning_rate': 1.4739884393063583e-05, 'epoch': 4.82} + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.07, 'learning_rate': 1.4566473988439304e-05, 'epoch': 4.82} + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0935, 'learning_rate': 1.4393063583815026e-05, 'epoch': 4.82} + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0645, 'learning_rate': 1.4219653179190749e-05, 'epoch': 4.83} + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.4197, 'learning_rate': 1.4046242774566473e-05, 'epoch': 4.83} + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|███████████��██████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0902, 'learning_rate': 1.3872832369942195e-05, 'epoch': 4.83} + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0462, 'learning_rate': 1.3699421965317917e-05, 'epoch': 4.83} + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0749, 'learning_rate': 1.352601156069364e-05, 'epoch': 4.83} + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 2149/2230 [7:10:29<17:09, 12.71s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0734, 'learning_rate': 1.3179190751445084e-05, 'epoch': 4.84} + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0546, 'learning_rate': 1.3005780346820809e-05, 'epoch': 4.84} + 97%|██████████████████████████████████████████████████████████████��███████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0643, 'learning_rate': 1.2832369942196531e-05, 'epoch': 4.84} + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0613, 'learning_rate': 1.2658959537572253e-05, 'epoch': 4.85} + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0652, 'learning_rate': 1.2485549132947976e-05, 'epoch': 4.85} + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0587, 'learning_rate': 1.2312138728323698e-05, 'epoch': 4.85} + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0617, 'learning_rate': 1.213872832369942e-05, 'epoch': 4.85} + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 2157/2230 [7:12:07<14:46, 12.15s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▊ | 2165/2230 [7:13:39<12:23, 11.44s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▊ | 2165/2230 [7:13:39<12:23, 11.44s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0527, 'learning_rate': 1.1965317919075144e-05, 'epoch': 4.85} +[WARNING|modeling_utils.py:388] 2022-03-24 00:12:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:12:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:12:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:12:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:12:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:12:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0402, 'learning_rate': 1.1791907514450867e-05, 'epoch': 4.86} +[WARNING|modeling_utils.py:388] 2022-03-24 00:12:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:12:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:12:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:12:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:12:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0552, 'learning_rate': 1.161849710982659e-05, 'epoch': 4.86} +[WARNING|modeling_bart.py:1051] 2022-03-24 00:13:21,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:13:21,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:13:26,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:13:26,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0607, 'learning_rate': 1.1445086705202312e-05, 'epoch': 4.86} +[WARNING|modeling_utils.py:388] 2022-03-24 00:13:26,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:13:26,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:13:33,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:13:33,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:13:33,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▉ | 2169/2230 [7:14:22<10:55, 10.74s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▉ | 2169/2230 [7:14:22<10:55, 10.74s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▉ | 2169/2230 [7:14:22<10:55, 10.74s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:13:44,273 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:13:44,273 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:13:44,273 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [7:14:32<10:35, 10.58s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▉ | 2170/2230 [7:14:32<10:35, 10.58s/it]g-point operations will not be computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:13:52,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:13:52,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:13:52,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:13:52,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:05:27,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▉ | 2171/2230 [7:14:42<10:13, 10.40s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:13:58,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|████████████████████████████���█████████████████████████████████████████████▉ | 2171/2230 [7:14:42<10:13, 10.40s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:13:58,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:14:02,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:13:58,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:14:02,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:13:58,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:14:07,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:13:58,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:14:07,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:13:58,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0433, 'learning_rate': 1.0751445086705203e-05, 'epoch': 4.87} +[WARNING|modeling_utils.py:388] 2022-03-24 00:14:11,034 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:13:58,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:14:13,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:13:58,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:14:13,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:13:58,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|███████████████████████████████████████████████████████████████████████████ | 2173/2230 [7:15:01<09:22, 9.86s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|███████████████████████████████████████████████████████████████████████████ | 2173/2230 [7:15:01<09:22, 9.86s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0471, 'learning_rate': 1.0578034682080925e-05, 'epoch': 4.87} + 97%|███████████████████████████████████████████████████████████████████████████ | 2173/2230 [7:15:01<09:22, 9.86s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:14:23,030 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:14:25,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:14:25,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.043, 'learning_rate': 1.0404624277456646e-05, 'epoch': 4.87} +[WARNING|modeling_utils.py:388] 2022-03-24 00:14:28,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:14:30,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:14:32,946 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:14:32,946 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0672, 'learning_rate': 1.0231213872832368e-05, 'epoch': 4.88} +[WARNING|modeling_bart.py:1051] 2022-03-24 00:14:37,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:14:39,046 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:14:40,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:14:40,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:14:17,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▏ | 2176/2230 [7:15:26<08:00, 8.89s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:14:42,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:14:44,746 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:42,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:14:46,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:42,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▏ | 2177/2230 [7:15:33<07:24, 8.38s/it] Setting `use_cache=False`...1] 2022-03-24 00:14:42,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▏ | 2177/2230 [7:15:33<07:24, 8.38s/it] Setting `use_cache=False`...1] 2022-03-24 00:14:42,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:14:51,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:50,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:14:53,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:50,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:14:54,916 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:50,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:14:54,916 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:50,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▏ | 2178/2230 [7:15:40<06:47, 7.84s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:14:56,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:14:59,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:56,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:15:01,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:56,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:15:01,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:14:56,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▏ | 2179/2230 [7:15:46<06:12, 7.30s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:15:02,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:15:05,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:02,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [7:15:51<05:34, 6.70s/it] Setting `use_cache=False`...1] 2022-03-24 00:15:02,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▎ | 2180/2230 [7:15:51<05:34, 6.70s/it] Setting `use_cache=False`...1] 2022-03-24 00:15:02,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:15:09,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:07,802 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:15:11,394 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:07,802 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:15:11,394 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:07,802 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▎ | 2181/2230 [7:15:56<05:01, 6.15s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:15:12,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:15:14,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:12,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:15:14,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:12,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▎ | 2182/2230 [7:16:01<04:29, 5.62s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:15:16,960 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:15:19,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:16,960 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:15:19,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:16,960 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:15:21,463 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:20,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▍ | 2184/2230 [7:16:08<03:26, 4.48s/it] Setting `use_cache=False`...1] 2022-03-24 00:15:20,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▍ | 2184/2230 [7:16:08<03:26, 4.48s/it] Setting `use_cache=False`...1] 2022-03-24 00:15:20,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▍ | 2184/2230 [7:16:08<03:26, 4.48s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:15:24,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▍ | 2184/2230 [7:16:08<03:26, 4.48s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:15:24,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:15:28,468 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:24,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:15:31,920 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:24,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:15:31,920 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:24,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:15:35,416 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:24,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▍ | 2185/2230 [7:16:22<05:34, 7.43s/it] Setting `use_cache=False`...1] 2022-03-24 00:15:24,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▍ | 2185/2230 [7:16:22<05:34, 7.43s/it] Setting `use_cache=False`...1] 2022-03-24 00:15:24,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▍ | 2185/2230 [7:16:22<05:34, 7.43s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:15:38,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:15:42,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:38,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:15:42,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:38,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:15:45,802 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:38,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:15:45,802 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:38,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:15:49,214 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:38,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▍ | 2186/2230 [7:16:36<06:50, 9.32s/it] Setting `use_cache=False`...1] 2022-03-24 00:15:38,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▍ | 2186/2230 [7:16:36<06:50, 9.32s/it] Setting `use_cache=False`...1] 2022-03-24 00:15:38,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▍ | 2186/2230 [7:16:36<06:50, 9.32s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:15:52,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:15:56,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:52,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:15:56,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:52,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:15:59,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:52,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:16:02,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:52,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:16:02,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:15:52,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2187/2230 [7:16:49<07:34, 10.58s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:16:06,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2187/2230 [7:16:49<07:34, 10.58s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:16:06,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0904, 'learning_rate': 8.15028901734104e-06, 'epoch': 4.9} +[WARNING|modeling_bart.py:1051] 2022-03-24 00:16:09,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:06,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:16:09,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:06,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:16:13,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:06,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:16:16,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:06,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2188/2230 [7:17:03<08:04, 11.54s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:06,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2188/2230 [7:17:03<08:04, 11.54s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:06,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2188/2230 [7:17:03<08:04, 11.54s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0813, 'learning_rate': 7.803468208092485e-06, 'epoch': 4.91} +[WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:16:23,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0839, 'learning_rate': 7.45664739884393e-06, 'epoch': 4.91} + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▌ | 2190/2230 [7:17:29<08:16, 12.41s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.085, 'learning_rate': 7.109826589595374e-06, 'epoch': 4.92} + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0905, 'learning_rate': 6.9364161849710975e-06, 'epoch': 4.92} + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0731, 'learning_rate': 6.76300578034682e-06, 'epoch': 4.92} + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 2192/2230 [7:17:56<08:06, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.069, 'learning_rate': 6.589595375722542e-06, 'epoch': 4.92} + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0554, 'learning_rate': 6.4161849710982654e-06, 'epoch': 4.93} + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|██████████████████��████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0552, 'learning_rate': 6.242774566473988e-06, 'epoch': 4.93} + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0705, 'learning_rate': 6.06936416184971e-06, 'epoch': 4.93} + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0627, 'learning_rate': 5.895953757225433e-06, 'epoch': 4.93} + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|█████████████████████��█████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0784, 'learning_rate': 5.722543352601156e-06, 'epoch': 4.93} + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0665, 'learning_rate': 5.549132947976878e-06, 'epoch': 4.94} + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0605, 'learning_rate': 5.375722543352601e-06, 'epoch': 4.94} + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|████████████████████████���██████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0544, 'learning_rate': 5.202312138728323e-06, 'epoch': 4.94} + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0643, 'learning_rate': 5.028901734104045e-06, 'epoch': 4.94} + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 2196/2230 [7:18:47<07:15, 12.80s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:20:50<04:50, 12.10s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:20:50<04:50, 12.10s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:20:50<04:50, 12.10s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|█████████████████████████████████████████████████���██████████████████████████▏| 2206/2230 [7:20:50<04:50, 12.10s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:20:50<04:50, 12.10s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:20:50<04:50, 12.10s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▏| 2206/2230 [7:20:50<04:50, 12.10s/it] Setting `use_cache=False`...1] 2022-03-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0734, 'learning_rate': 4.682080924855491e-06, 'epoch': 4.95} +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0615, 'learning_rate': 4.508670520231213e-06, 'epoch': 4.95} +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.053, 'learning_rate': 4.335260115606936e-06, 'epoch': 4.95} +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0534, 'learning_rate': 4.161849710982659e-06, 'epoch': 4.96} +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0597, 'learning_rate': 3.988439306358381e-06, 'epoch': 4.96} +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0839, 'learning_rate': 3.8150289017341036e-06, 'epoch': 4.96} +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0639, 'learning_rate': 3.641618497109826e-06, 'epoch': 4.96} +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:20:20,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:21:34,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:21:34,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:21:34,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:21:34,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0344, 'learning_rate': 3.4682080924855487e-06, 'epoch': 4.96} +[WARNING|modeling_bart.py:1051] 2022-03-24 00:21:34,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:21:34,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:21:34,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▍| 2215/2230 [7:22:33<02:47, 11.18s/it] Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▍| 2215/2230 [7:22:33<02:47, 11.18s/it] Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0682, 'learning_rate': 3.294797687861271e-06, 'epoch': 4.97} + 99%|████████████████████████████████████████████████████████████████████████████▍| 2215/2230 [7:22:33<02:47, 11.18s/it] Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▍| 2215/2230 [7:22:33<02:47, 11.18s/it] Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▍| 2215/2230 [7:22:33<02:47, 11.18s/it] Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:21:58,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:21:58,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0535, 'learning_rate': 3.121387283236994e-06, 'epoch': 4.97} +[WARNING|modeling_utils.py:388] 2022-03-24 00:22:02,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:22:02,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:22:02,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:22:09,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:22:09,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.06, 'learning_rate': 2.9479768786127167e-06, 'epoch': 4.97} +[WARNING|modeling_utils.py:388] 2022-03-24 00:22:09,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:22:09,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:22:17,488 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▌| 2218/2230 [7:23:03<02:06, 10.53s/it] Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▌| 2218/2230 [7:23:03<02:06, 10.53s/it] Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0719, 'learning_rate': 2.774566473988439e-06, 'epoch': 4.97} +[WARNING|modeling_bart.py:1051] 2022-03-24 00:22:23,696 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:22:23,696 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:22:27,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:22:27,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:22:27,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0284, 'learning_rate': 2.6011560693641614e-06, 'epoch': 4.98} +[WARNING|modeling_utils.py:388] 2022-03-24 00:22:33,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:22:35,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:22:35,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:22:35,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:22:40,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:22:40,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:22:43,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:22:46,117 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:22:46,117 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:22:48,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_utils.py:388] 2022-03-24 00:22:48,372 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:22:52,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:22:54,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:22:54,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-24 00:16:20,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|████████████████████████████████████████████████████████████████████████████▋| 2222/2230 [7:23:40<01:14, 9.27s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:22:56,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:22:58,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:22:56,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:23:00,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:22:56,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:23:01,928 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:22:56,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:23:01,928 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:22:56,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|████████████████████████████████████████████████████████████████████████████▊| 2223/2230 [7:23:47<01:01, 8.79s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:23:03,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:23:05,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:03,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:23:07,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:03,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:23:07,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:03,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|████████████████████████████████████████████████████████████████████████████▊| 2224/2230 [7:23:54<00:49, 8.29s/it] Setting `use_cache=False`...1] 2022-03-24 00:23:03,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:23:12,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:10,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:23:14,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:10,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:23:15,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:10,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:23:15,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:10,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|████████████████████████████████████████████████████████████████████████████▊| 2225/2230 [7:24:01<00:39, 7.85s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:23:17,714 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:23:20,561 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:17,714 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:23:21,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:17,714 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:23:21,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:17,714 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:23:24,513 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:23,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:23:26,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:23,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:23:26,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:23,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|████████████████████████████████████████████████████████████████████████████▉| 2227/2230 [7:24:12<00:19, 6.49s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:23:28,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:23:30,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:28,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:23:30,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:28,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|████████████████████████████████████████████████████████████████████████████▉| 2228/2230 [7:24:16<00:11, 5.80s/it][WARNING|modeling_bart.py:1051] 2022-03-24 00:23:32,199 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:23:34,894 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:32,199 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:23:34,894 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:32,199 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[WARNING|modeling_bart.py:1051] 2022-03-24 00:23:36,676 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-24 00:23:35,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|█████████████████████████████████████████████████████████████████████████████| 2230/2230 [7:24:23<00:00, 4.53s/it][INFO|trainer.py:1492] 2022-03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|█████████████████████████████████████████████████████████████████████████████| 2230/2230 [7:24:23<00:00, 4.53s/it][INFO|trainer.py:1492] 2022-03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 0.0607, 'learning_rate': 6.936416184971098e-07, 'epoch': 5.0} +[INFO|modeling_utils.py:1081] 2022-03-24 00:23:50,453 >> Model weights saved in ./pytorch_model.bin:23<00:00, 4.53s/it][INFO|trainer.py:1492] 2022-03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +[INFO|modeling_utils.py:1081] 2022-03-24 00:24:02,084 >> Model weights saved in ./pytorch_model.bin:23<00:00, 4.53s/it][INFO|trainer.py:1492] 2022-03-24 00:23:38,677 >> 5,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...coderModel.forward` and have been ignored: input_length. If input_length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message.